Z-Image Turbo LoRA training with AI Toolkit and Z-Image ControlNet Full Tutorial for Highest Quality #350
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Z-Image Turbo LoRA training with AI Toolkit and Z-Image ControlNet Full Tutorial for Highest Quality
Full tutorial: https://www.youtube.com/watch?v=ezD6QO14kRc
Z-Image Turbo LoRA training with Ostris AI Toolkit + Z-Image Turbo Fun Controlnet Union + 1-click to download and install the very best Z-Image Turbo presets. In this tutorial, I will explain how to setup Z-Image Turbo model properly in your local PC with SwarmUI and download models and use them with highest quality via ready presets. Moreover, I will show to install Z-Image Turbo Fun Controlnet Union to generate amazing quality images with ControlNet preprocessors. Furthermore, I will show how to 1-click install AI Toolkit from Ostris and train Z-Image Turbo model LoRAs with highest quality configs made for every GPU like 8 GB GPUs, 12 GB GPUs, 24 GB GPUs and so on. I did a massive research to prepare these Z-Image Turbo model training configurations.
👇 Links & Resources Mentioned:
Download SwarmUI & Models: [ https://www.patreon.com/posts/Download-SwarmUI-Models-114517862 ]
Ostris AI Toolkit (SECourses Version): [ https://www.patreon.com/posts/Ostris-AI-Toolkit-140089077 ]
Ultimate Batch Image Processing App: [ https://www.patreon.com/posts/Ultimate-Batch-Image-Processing-App-120352012 ]
SwarmUI with ComfyUI Backend Windows Tutorial: [ https://youtu.be/c3gEoAyL2IE ]
SwarmUI with ComfyUI Backend RunPod and Massed Compute Cloud Tutorial: [ https://youtu.be/bBxgtVD3ek4 ]
⏱️ Video Chapters:
00:00:00 Introduction to Z-Image Turbo Model
00:00:54 FP8 Scaled Version 5.7GB for Low VRAM
00:01:10 ControlNet Union with Z-Image Turbo
00:01:30 LoRA Training with Ostris AI Toolkit
00:02:00 Default vs Custom Training Preset Quality Comparison
00:03:00 RunPod Cloud Training Preview
00:03:40 MassedCompute Cloud Training Preview
00:04:16 Downloading Z-Image Models via SwarmUI
00:05:00 Z-Image Turbo Core Bundle & ControlNet Files
00:05:58 FP8 Scaled Model & Musubi Tuner Converter
00:07:13 Updating ComfyUI for Sage & Flash Attention
00:08:13 Updating SwarmUI & ControlNet Preprocessors
00:08:52 Updating & Importing Latest SwarmUI Presets
00:09:20 Generating with Quality 2 Fast Preset
00:10:48 Generating with Quality 1 Upscale Preset
00:11:35 Quality 1 vs Quality 2 Visual Comparison
00:12:13 Setting up ControlNet Input & Aspect Ratio
00:13:41 ControlNet Strength Settings & Canny Test
00:15:26 Using Depth Preprocessor with Z-Image
00:15:58 Coloring Lineart Drawings with ControlNet
00:16:58 Lineart Preprocessing Comparison
00:17:50 Ostris AI Toolkit Installation Prerequisites
00:19:12 Installing Ostris AI Toolkit on Windows
00:20:02 First Time UI Setup & Launching Interface
00:21:04 Loading Custom Training Configs
00:21:38 Creating a New Dataset Structure
00:22:24 Ultimate Batch Image Processing App Install
00:23:17 Dataset Prep Stage 1: Auto-Zooming with SAM2
00:26:08 Dataset Prep Stage 2: Resizing to Exact Resolution
00:28:12 How to Select Best Training Images
00:30:24 Importance of Emotions & Angles in Datasets
00:31:44 Z-Image Resolution & Aspect Ratio Rules
00:33:21 Configuring Training Parameters & Epochs
00:36:52 Resolution Impact on Training Speed
00:37:46 Starting the Training Job on Windows
00:38:39 Monitoring Training Progress & VRAM
00:39:43 Checkpoint Generation Settings
00:40:40 Resuming Training from Last Checkpoint
00:42:09 Training Speeds on RTX 5090 vs 4090 vs 3060
00:43:01 Training Quality: Default vs Custom Preset Comparison
00:44:21 Testing LoRAs with SwarmUI Grid Generator
00:46:04 Fixing ControlNet Error in Grid Generation
00:47:09 Comparing Generated LoRA Checkpoints
00:47:38 Using Trained LoRA with ControlNet Union
00:48:10 RunPod: GPU Selection & Template Setup
00:50:32 RunPod: Port 8675 Config & Initialization
00:51:36 RunPod: Uploading Installation Files
00:52:01 RunPod: One-Click Installation Command
00:54:07 RunPod: Starting AI Toolkit & Proxy Connection
00:54:38 RunPod: Uploading Dataset via Interface
00:55:32 RunPod: Starting the Training Job
00:56:24 RunPod: Speed & Cost Analysis
00:57:28 RunPod: Auto-Stop Command Setup
00:58:24 MassedCompute: GPU Selection & Coupon Code
01:00:16 MassedCompute: ThinLinc Client Setup
01:01:21 MassedCompute: Transferring Files to Shared Folder
01:02:55 MassedCompute: Installation Command
01:05:49 MassedCompute: Connecting via Public URL
01:06:54 MassedCompute: Starting Training Job
01:08:43 Downloading Checkpoints & Stopping Instance
🚀 Master Z-Image Turbo & LoRA Training: The Ultimate Guide!
In this comprehensive tutorial, I show you how to generate ultra-realistic images in seconds using the lightweight Z-Image Turbo model. We cover everything from 1-click installation on SwarmUI (ComfyUI backend) to mastering ControlNet Union for precise image control.
But that’s not all! I also reveal how to train your own high-quality Z-Image Turbo LoRAs using the Ostris AI Toolkit. I have developed a custom training preset that significantly outperforms the default settings—you have to see the comparison to believe it! Whether you are on a local PC, RunPod, or MassedCompute, this guide has you covered.
🔥 What You Will Learn:
Z-Image Turbo Setup: How to run this fast, 6GB model (FP8 included) on almost any GPU.
ControlNet Mastery: Use Canny, Depth, and Lineart to control your generations perfectly.
LoRA Training: Step-by-step guide
Video Transcription
00:00:00 Greetings everyone. In this tutorial video, I will show you how to use Z-Image Turbo version.
00:00:06 It is a very fast, very lightweight model that you can generate amazing high-quality,
00:00:13 extremely realistic, or stylized images on your local PC. The model is as small as 6 GB with
00:00:22 maximum quality. And it will run on literally every GPU, and since it is a turbo model,
00:00:29 it only requires 9 steps. All these images were locally generated ultra-fast,
00:00:36 like 10 seconds generation time, and they are very high resolution as you are seeing right
00:00:41 now. I will explain all of these. And all of these images were generated in SwarmUI with
00:00:47 using the ComfyUI backend with our presets. 1 click to install, download, and use right away.
00:00:54 You probably didn't see before, but I have Z-Image Turbo FP8 scaled. This is 5.7 GB in size,
00:01:04 so it fits into all GPUs. But this is not all. Furthermore, I will show you how to
00:01:10 use ControlNet with Z-Image Turbo model. So you see, based on this input image,
00:01:21 these images were generated. I will show you how to use ControlNet with Z-Image Turbo model.
00:01:28 But we are not done yet. I will also show you how to train your Z-Image Turbo model LoRAs as
00:01:36 you are seeing right now. By using the AI Toolkit from Ostris, you will be able to
00:01:42 train amazing LoRAs even on a very low VRAM having very weak GPU fully locally. I have
00:01:50 researched it extensively and prepared amazing presets for all GPUs with the highest quality.
00:01:57 We have presets for 8 GB GPUs to 24 GB GPUs, and each one of them has a very high quality.
00:02:05 For example, let me show you the default preset of the Ostris AI Toolkit versus our preset. So this
00:02:14 left image was trained with the default preset of the AI Toolkit that he shows in his tutorial,
00:02:21 and this is our preset. This is default preset; this is our preset. This is default preset;
00:02:28 this is our preset. There is a massive quality difference between default versus ours. And
00:02:34 these are not cherry-picked images; these are grid generations. Another case: this is default preset,
00:02:39 and this is my preset. This is default preset, and this is my preset. This is default preset,
00:02:45 and this is my preset. There is a massive difference. So I will show
00:02:49 how to train with our preset on your PC. This is default preset, and this is our
00:02:55 preset. Default preset, our preset. You will get amazing quality training images.
00:03:00 But we are not done yet. I will also show you how to train on RunPod extremely efficiently. With RTX
00:03:07 4090, for example, you will be able to train with amazing speed. I will also show how you can turn
00:03:15 off your pod automatically after training. All of the installation is literally 1 click, and
00:03:21 my installers install the latest version of the application. I am not using any template, so it
00:03:28 is not static. We are using the official PyTorch template. Therefore, with any GPU, you will be
00:03:34 able to 1 click to install and use this amazing trainer on RunPod. But we are not done yet. I will
00:03:40 also show how to install and use on MassedCompute, our favorite GPU provider, which is very fast. So
00:03:48 you will be able to install it on MassedCompute and access it from a public URL like this and
00:03:55 train it like it is in your PC. So easy, so fast, and it will run on cloud, not on your computer.
00:04:03 And with our SwarmUI presets, you will be able to use this model with highest quality. And with our
00:04:10 model downloader, you will be able to download the necessary files with just 1 click. So let's begin.
00:04:16 As a first step, we need to download Z-Image models. For that, we are going to use our SwarmUI
00:04:22 auto-installer and model downloader. Download the latest zip file as usual as in previous tutorials.
00:04:28 If you don't know how to install and use SwarmUI, follow this video; it will explain you everything.
00:04:33 The links will be in the description of the video. Then move and copy the zip file into your SwarmUI
00:04:39 installation, WinRAR extract here. Overwrite all of the files; this is very important. You
00:04:45 need to overwrite all the files. Then double click Windows start download models app .bat file run.
00:04:51 This will start the model downloader. First of all, we need to download Z-Image models.
00:04:56 We have significantly upgraded our application. We have moved to Gradio version 6. The initial
00:05:03 loading will take a little bit longer than before, as you are seeing right now,
00:05:08 but the application will work much better after loaded. Okay, we are loaded right
00:05:13 now. You see we have significantly improved our interface. We have updated our bundles,
00:05:19 and you see there is Z-Image Turbo core bundle. This bundle will download Z-Image Turbo BF16,
00:05:26 Qwen 3 4 billion parameters text encoder. This is necessary for the Z-Image. Z-Image
00:05:32 Turbo full ControlNet Union. We are going to use this ControlNet Union model to be able to
00:05:38 generate images with references to images we want with using ControlNet preprocessors. And
00:05:44 this is the VAE that it uses. So you can click download all models from here, or you can click
00:05:49 individually these buttons to download. Download all models is the best approach.
00:05:54 Alternatively, if you have a very low VRAM GPU, in the image generation models,
00:05:59 you will see Z-Image Turbo models, and you will see we have Z-Image Turbo FP8 scaled.
00:06:06 This is a model that I have made myself. With our SECourses Musubi Tuner premium application,
00:06:13 now we can convert Z-Image models into FP8 scaled as well. So let me start the application to show
00:06:20 you. This is 5.73 gigabytes in size, so it fits into 6 GB GPUs as well. However, don't
00:06:28 get confused even if you don't have powerful GPU. With SwarmUI that is using ComfyUI backend, you
00:06:33 can run any model on any GPU, literally. Because it will use your RAM memory to hold some part of
00:06:39 the model, and it will work. Don't worry about that. But to get maximum speed, if you want, you
00:06:45 can use FP8 scaled if you have like 6 gigabytes of GPU or 8 gigabyte or 10 gigabytes of GPU.
00:06:52 So in our SECourses Musubi Tuner, we have FP8 model converter, and you can convert
00:06:59 Qwen image and Z-Image models. This is how I converted. Follow the model downloads on the
00:07:04 CMD window. Also on the very top, you will see that they have been downloaded. Then you
00:07:09 are ready. So we can move to next step. As a next step, you need to update your
00:07:13 ComfyUI installation. Get the latest zip file from here. Go to your ComfyUI installation,
00:07:19 again extract and overwrite all the files. Then double click and run the Windows install
00:07:25 or update ComfyUI .bat file. This is important. So it will update ComfyUI to the latest version,
00:07:31 which supports Z-Image models with ControlNet. So my ComfyUI installation were already up to date,
00:07:38 but still it is updating everything. This ComfyUI installation is perfect. It supports
00:07:42 Sage Attention, Flash Attention, xFormers, Triton, everything that you need with RTX 5000
00:07:49 series GPUs support. So this is a very, very good installation. Okay, it has been done. Moreover,
00:07:55 this ComfyUI installation supporting special samplers and schedulers like bon_tangent,
00:08:00 Beta 57. You need to use our ComfyUI installation to have these extra samplers and schedulers.
00:08:07 As a final step, we need to update our SwarmUI. So we have SwarmUI update .bat
00:08:13 file. This will also install ControlNet preprocessors automatically for you,
00:08:18 so you won't be needed to install. The update and installation both of them will do. You can
00:08:23 read the announcements. You should always read the announcements that I have make. You see 3
00:08:29 December 2025 version 108 announcement. If you don't have it or if you are using somewhere else,
00:08:36 you can also manually install. You will have a button here that will tell you to install
00:08:41 ControlNet. But since our installation is doing that automatically, you don't need
00:08:45 it. Now we have everything ready. Now we need to update our presets. To get the latest presets,
00:08:52 you can use the import feature as usual. You see there is import, choose file, select the
00:08:58 latest preset from here and overwrite everything. However, if you want a clean installation that
00:09:03 will delete all the existing presets and update them to the latest version, which I recommend,
00:09:08 we have Windows preset delete import file. And click yes, it will backup your preset and update
00:09:14 the presets like this. Everything is ready. Then refresh your presets and sort by name.
00:09:20 Okay, so how we are going to use Z-Image Turbo model? First of all,
00:09:24 quick tools, reset params to default. This is important. Then we have 2 separate presets for
00:09:31 Z-Image. You see Quality 1 and Quality 2. Quality 2 is faster. It is only 1 process;
00:09:38 it doesn't upscale. So let me demonstrate both of them. Direct apply. And for example,
00:09:43 let's use this prompt. Okay. Copy paste your prompt here. It is all set. You also need to
00:09:50 set your aspect ratio. I did set base resolution of this model when you download the model that I
00:09:56 have as a 1536 to 1536. By default, it was 1024. However, I think this model works better with this
00:10:06 resolution. Sometimes you may get some mistakes, but generate more images. Because it is too fast.
00:10:11 Let me show you real time. So let's generate 8 images. Generate. After the model loaded,
00:10:16 let's see the speed. Moreover, you see the models that I upload for you have images,
00:10:22 have description, so you can read these descriptions, and they are easier to use.
00:10:27 Okay, the first generation started. The first one will be slower than the other ones. Okay, first
00:10:32 one is done. So you see it was done in, let's see, 12 seconds. The next one is done in 7.84
00:10:42 seconds. What about using the more higher quality preset? So let me demonstrate that. I will reuse
00:10:48 parameters of this one so it will set the seed. Then go back to preset and I will direct apply.
00:10:55 So this will activate the special refine upscale. Moreover, this is important, you need to download
00:11:01 the upscale models from here. So we have other models and image upscaling models. So download
00:11:09 these 3 models to have accurate upscaling. Okay, then generate. Now this will upscale this image,
00:11:17 but we cannot upscale a lot because this model has limitations. Also sometimes you can see
00:11:23 some hallucinations at the right or left side of the image because of the high resolution that we
00:11:29 generate. Sometimes you won't see, sometimes you will see. It depends. So let's have a comparison.
00:11:35 The left one is the Quality 2, which is way faster to generate. And the right one is the
00:11:41 Quality 1 preset that we have. So you see how much realism and details we add this way. It changes
00:11:48 the image because this model is very sensitive to the resolution and upscale. However, it improves
00:11:56 the quality significantly. So you can test both of the presets and decide which one is working better
00:12:01 for you, which one is generating better images for your use case, and decide and use that way.
00:12:08 So how we are going to use the ControlNet? The ControlNet logic is same. So let's reload this
00:12:14 page, reset params to default. And you can use either of the presets; both of them is working.
00:12:20 Let's use the direct apply and Quality 2. Now what is the difference? You need to provide a
00:12:26 ControlNet input image, which will be the base of your image. For example, let's try this pose. Then
00:12:33 you need to select your ControlNet preprocessor. This is not mandatory. If you are using already
00:12:38 a Canny image, you don't need to select this. But if you are not using a preprocessed image,
00:12:43 you need to select it. Then click this preview to see the preset output. If you have different
00:12:49 aspect ratio, it will show you mangled preview. So let's click preview. You see this is not
00:12:56 accurate. So how to make it accurate? You need to have accurate aspect ratio with
00:13:01 your according to your input image, according to your ControlNet input. How you can set the
00:13:07 aspect ratio accurately? I have been telling the SwarmUI developer to add this feature,
00:13:11 but he still didn't add. So you need to choose file, upload your reference image, click res,
00:13:18 and use the exact aspect ratio or the closest. Let's use the exact aspect ratio. Then disable
00:13:24 this init image. So we use this init image to set our aspect ratio accordingly. Now
00:13:31 when I click the preview, it will show me the accurate preview like this. I am ready.
00:13:35 You don't set ControlNet union type for Z-Image Turbo model. This is a little bit different
00:13:41 ControlNet than the previous ones that we know. But you can enable and test it; it won't make
00:13:46 difference. So we don't set it. So the ControlNet strength matters. Let me show you what I mean by
00:13:52 that. So let's write our prompt and let's set a seed like this and generate. Now the ControlNet
00:14:00 strength will impact your output. I tested it and I find that between 0.6, 60 percentage, to 1,
00:14:09 100 percentage, works. So this is 100 percentage result. Let's see the 60 percentage result. Okay,
00:14:15 like this. By the way, you see this prompt and this ControlNet is not very related, but it is
00:14:21 still following it pretty accurately. Okay, this time 60 percentage didn't work well. So let's
00:14:28 make it 70 percentage. However, image quality increased, so you need to find a balance between
00:14:33 both of them. We can change the prompt. Let me demonstrate. So let's make this 60 percentage. A
00:14:39 man wearing an amazing suit. Because this prompt is much more matching to the our ControlNet image.
00:14:48 You see this one. Okay, now 60 percentage will generate a very accurate image as you are seeing
00:14:54 right now. Yes. And I can choose the Tier 1 like this, direct apply and generate. Then it
00:14:59 will generate higher quality, higher resolution. So what does this do is that it first render,
00:15:06 then it does another render over it with upscaling to improve the quality. And yes. You see the same
00:15:13 pose, different person definitely. However, it is working really good as you are seeing right now.
00:15:19 So this was the Canny Edge preprocessor. You can use the Depth. Let's see the Depth preprocessing
00:15:26 from here. When you first time click preview, it will download the necessary model if you
00:15:31 don't have it. So from the debug menu, you can see whether it is downloading or not. And let's see,
00:15:37 preview is not done yet. We are still waiting. Probably it is trying to download. Yes,
00:15:42 you see it has downloaded it here. I can see it. And yes, the Depth has been generated.
00:15:47 Now generate again. So you can use either way. You can use Canny, you can use Depth,
00:15:53 whichever is working best for your case. You can use Lineart. If you have a Lineart drawing,
00:15:58 you can color it. Let me demonstrate you. Then I will choose that image from here. X,
00:16:04 choose file. I also need accurate aspect ratio. So let's choose the file from here. Resolution,
00:16:10 use the closest aspect ratio. And this is it. Then disable init image. And I am not going to
00:16:17 use any ControlNet preprocessing. And let's try an amazing cell shaded render of a dragon. Okay,
00:16:25 let's see what we get. Since this is ControlNet Union, it should be able to understand it. By
00:16:31 the way, this time we can also increase the ControlNet strength if we want. Okay,
00:16:35 we are getting something. This was a very simple prompt, but I think we will get a good image.
00:16:41 Yeah, pretty amazing. Pretty amazing result. If I want more matching, more exact matching,
00:16:47 I can increase this strength. So this is another generation. So as you increase the strength, it
00:16:52 will more match to the original image. This one is looking like this. So this is the way. We can also
00:16:58 set the preprocessing to like, let's see what we have, Lineart standard preprocessing. So let's see
00:17:05 the preview. When you first time click preview, it will download the model. And I am first time
00:17:10 clicking it, so it is taking time for preview to appear. It is downloading the model. Okay,
00:17:16 this is another generation. Pretty good. So you see from this simple Lineart, I can generate very,
00:17:22 very good images like these ones. Okay, it is here. And this is the preview. So I can use
00:17:27 this preprocessing as well and compare the result. And this is the result with Lineart
00:17:33 preprocessing. So you see this is entirely different than this version. So you can use
00:17:37 either way. It is working really good, really amazing. So I recommend you to test your case.
00:17:43 Okay, now the part that you have been waiting for. How to use Ostris AI Toolkit to train your
00:17:50 Z-Image Turbo LoRA models. First of all, we need to download and install Ostris AI
00:17:56 Toolkit. Currently, Ostris AI Toolkit has a bug. Therefore, you need to download this zip file:
00:18:04 AI Toolkit SECourses Version 2. Until he fixes that bug, I made a pull request which fixes the
00:18:12 bug of the CPU offloading. So I will update this post. So when you are watching this tutorial, if
00:18:18 you don't see this AI Toolkit SECourses Version 2, then you need to download this official version,
00:18:23 which installs from the official repository. This will install from my forked repository. Then move
00:18:29 the zip file into the drive where you want to install and extract all. Enter inside the folder.
00:18:36 So you see we have Windows Install, Windows First Time UI Setup, Windows Start, and Windows Update
00:18:42 .bat files. But before starting installation, you need to read the Windows requirements. We
00:18:48 have the classical requirements, but additionally, you need to download Node.js. I am using Node.js
00:18:54 version 22.20.0. The direct link is here. You just need to download it, start it, next. So
00:19:02 you see I already have. You just need to click next, next, next, next. That is it. Nothing else.
00:19:07 So after you have followed both of the requirements, click Windows install .bat file
00:19:12 run. This will install the Ostris AI Toolkit with Flash Attention, Sage Attention, xFormers with
00:19:19 Torch 2.8, CUDA 12.9. And my installation supports all of the GPUs starting from RTX 1000 series to
00:19:28 5000 series. Or on cloud GPUs, it supports A100, H100, B200, whatever the GPU that you
00:19:36 use on cloud. So wait for initial installation to be completed. Okay, so the installation has been
00:19:42 completed. You can scroll up and see if there are any errors. There shouldn't be. This is important:
00:19:48 you need to have Python 3.10.11 installed. Then press any key to continue. Second thing
00:19:55 is Windows First Time UI Setup. This is 1 time, and you need to run this. Do not run this again
00:20:02 after you have made the initial installation. For this to work, as I said, you need to have Node.js
00:20:09 installed. Otherwise, it will not work. Node.js is a system-wide installation. It is not like a
00:20:14 virtual environment, so you need to install it as I have shown you. You may get some warnings like
00:20:19 this. These are all unimportant. You see there are warnings. You can just ignore all of them, and it
00:20:26 is done. You see setup completed successfully. Now we are ready to use. Next time for updating, you
00:20:31 can use Windows update app .bat file. But since we just installed, let's start the application. So it
00:20:38 will give us public URL and localhost URL. Public URL is useful in MassedCompute. It doesn't work in
00:20:45 RunPod, but in MassedCompute it works. I will show both of them hopefully. This is our interface.
00:20:51 Go to New Job and Show Advanced. There is no preset saving and loading in the AI Toolkit yet.
00:20:59 So how you are going to use my presets? Enter inside the Z-Image Turbo LoRA
00:21:04 configs. According to your GPU, select the configuration. Since I have RTX 5090,
00:21:10 I am going to use 24 GB. You see there is no higher configuration because this
00:21:15 model fits into 24 GB. Quality 1 is better than Quality 2. Quality 2 is better than Quality 3,
00:21:22 4, and it goes on. So let's open this Quality 1 config. Copy it. Paste it here. And Show Simple.
00:21:29 It will update everything except the dataset. So you need to have your dataset from Datasets, New
00:21:38 Dataset. Give a name like My Dataset. Then upload your images with drag and drop here. They will be
00:21:44 uploaded. Then you can type any caption: Caption 1, Caption 2, anything. I will explain everything,
00:21:51 don't worry. This is just explaining you how it works. The dataset will be saved inside the
00:21:56 AI Toolkit, inside datasets here. You see My Dataset. And the captions are image file names
00:22:03 with txt. So image file name and txt. This is the format of the dataset system of the
00:22:10 AI Toolkit. I already have my dataset here. So you see this is my dataset. I will copy this, paste it
00:22:17 into datasets folder. And when I go to datasets and refresh this page, you see it will appear.
00:22:24 So how to prepare your training images dataset? Now I will explain this part extremely carefully,
00:22:31 so watch all of it. To automatically prepare your images, I recommend to use Ultimate Batch Image
00:22:38 Processing App. You see it is under Auxiliary Tools section. So let's go to this link. I
00:22:44 recommend you to check out these screenshots, read this post. Let's scroll down and let's
00:22:50 download the latest version. Then let's move it into our Q drive, right click, extract here,
00:22:56 enter inside it. First of all, we need to install. This is a pretty fast installation.
00:23:01 This application is very lightweight, but it has so many features. Okay,
00:23:05 the installation has been completed. Scroll up to see if there are any errors or not. Then close
00:23:11 this. Then let's start the application. Windows Start application run. Why is this application
00:23:17 important? Because this will allow you to batch preprocess your training images. You
00:23:23 can of course manually preprocess your images, but this makes it much easier and accurate.
00:23:30 So I have some sample images to demonstrate you the power of this tool. I will copy this path
00:23:37 and enter as an input folder. Then as an output folder, let's output them into my other folder as
00:23:45 Preprocess Stage 1. Then the aspect ratio. If you are going to generate images with 16 by 9 always,
00:23:55 you can make your aspect ratio accordingly. However, if you are not sure which aspect
00:24:00 ratio you are going to use, I recommend you to use square aspect ratio with 1328 to 1328
00:24:08 pixels. This is the base resolution of the Qwen image model or Qwen image edit model. This works
00:24:14 best. And with this aspect ratio and resolution, you can still generate any aspect ratio. All the
00:24:20 images I have shown you in the beginning of the tutorial were trained with 1328 to 1328.
00:24:27 Then there are several options. You can select the classes from here to zoom them in. This is
00:24:33 extremely useful when you are training a person. Because you want to zoom in the person. What I
00:24:39 mean by that? You see in these images, there are a lot of extra spaces that can be zoomed
00:24:46 in. For example, in this image, I can zoom in myself a lot. So you can choose this,
00:24:52 or there is a better one which is based on SAM2. This takes anything as a prompt. Let's
00:25:00 say "person". You can set your batch size, GPU IDs. These are all advanced stuff if you are
00:25:05 going to process a lot of images. So default is good. Let's start processing. What this is going
00:25:12 to do is it is going to zoom in the class I have given without cropping any part of the
00:25:18 class. So this will not make these images exactly as this resolution or this aspect
00:25:24 ratio. It will try to match this aspect ratio without cropping the any part of the subject.
00:25:30 So let's see what kind of images we are getting. We are saving them inside here.
00:25:34 You see it has generated this subfolder. This is important because in the second stage,
00:25:40 we are going to use this to make them exactly same resolution. When I enter inside this folder, you
00:25:48 can see that it has zoomed in the person. So this is how it works. And when zooming in, it will not
00:25:55 crop any parts of the image. And also when zooming in, it will try to match the aspect ratio that you
00:26:02 have given like this. Okay, the first stage has been completed. Now the second stage is resizing
00:26:08 them into the exact resolution. This will crop the subject if it is necessary. Like cropping the
00:26:14 body parts to match the exact resolution. So this takes the parent folder, not this folder. This is
00:26:21 not the folder, but this is the folder that I need to give. And I need to change the resolution that
00:26:26 I want. So this will look a subfolder named as exactly like this. You can have multiple
00:26:32 resolutions actually. For example, in the image cropper, I can add here another resolution. Let's
00:26:38 say 16:9. So this is the resolution of 16:9 for Qwen image model. Let's add it like 1744 to 992.
00:26:48 Let's start processing. It will process this new resolution as well. And I am going to see a folder
00:26:54 generated here in a minute when it is processed. Okay, it is started processing. Now it will try
00:27:00 to match this aspect ratio. It may not match it exactly. Why? Because it is not going to crop any
00:27:07 body parts. So you see this image cannot match that aspect ratio. This is not a suitable image
00:27:13 for that. This is almost still square. However, in the second tab, when I go to image resizer,
00:27:18 when I type it, you see I have given the parent folder. Let's wait for this one to finish. Okay,
00:27:25 it is almost finished. By the way, if you use this YOLO, it is faster than SAM2. So just delete this
00:27:32 and select your class from here. It supports so many classes to focus on them. Okay, it is done.
00:27:38 Now I am going to make the output folder as Final Images like this. And I will click Resize Images.
00:27:45 You can also make resize without cropping, so it will make padding expansion. So let's resize
00:27:52 images. I recommend cropping; it is better. Then let's go back to our folder Final Images. Okay,
00:27:59 in here you will see that it has cropped the body parts, resized it into the exact resolution like
00:28:06 this. And these are the square images. They are much more accurate than the other ones.
00:28:12 Now I have my images ready. However, this is not a very good collection of images. It is another
00:28:20 thing that you need to be careful of. I have used these images to train the models that I have shown
00:28:26 you in the beginning of the tutorial. So when we analyze these images, what do you see? I have full
00:28:32 body pose like this. I have half body pose. I have very close shot. And when you have images,
00:28:39 what matters is that it should have good lighting, good focus. These two are extremely
00:28:46 important. It should be very clear. All of these images are captured with my cheap phone,
00:28:52 so they are not taken with a professional camera. For example, when we look at this image, you see
00:28:57 it is not even a very high quality. This is how it looks. And this is a real image. This is a raw
00:29:26 image. And when we look at the AI generated image, as you can see, it is even higher quality than my
00:29:33 raw image. And therefore, you should add highest possible quality images into your training dataset
00:29:41 to get the maximum quality images. What else is important? You should try to have different
00:29:48 clothings so it will not memorize your clothing. This is super important. Try to have different
00:29:53 clothings, different times, different backgrounds. All of these will help. Whatever you repeat in
00:29:59 your training dataset, the model will memorize them. You don't want that. You want only yourself
00:30:06 or the subject if you are training a style, the style, or an object, the object, to be repeated.
00:30:12 Nothing else. I will explain them in the style and the item training, the product training part.
00:30:17 And one another thing is that you should add the emotions that you want. If you want smiling, you
00:30:24 should add it. If you want laughing, you should add it. So whatever the emotion you have will make
00:30:31 100 percentage quality difference in your outputs. Try to have all the emotions you want. But that is
00:30:39 not all. Also try to have all the angles you want. If you want to generate images that looks down,
00:30:46 you should have an image that has a look down like this. Or from this angle, this angle.
00:30:51 Whatever angle. So do not add the angles and poses that you don't want after training. And
00:30:58 add the poses and the angles you want to generate after training. So if we summarize again: have the
00:31:07 emotions, have the poses, have the angles, have different backgrounds, have different clothings,
00:31:14 have highest possible quality lighting and focus. Do not have blurry backgrounds. Do not have
00:31:21 fuzzy backgrounds. They will impact your output quality. So in the AI world, whatever you give,
00:31:28 you get it. And with this medium quality dataset, I am able to generate amazing images.
00:31:33 If I increase the number of images, the variety in these images, I can get even better quality.
00:31:38 Okay, now you understand how to prepare your dataset. There are few tricky issues with the
00:31:44 Z-Image Turbo model training. The best quality for this model is 1536 pixels. So make your images
00:31:53 1536 pixels if they are bigger resolution. Not 1328 or not 1024. 1536 pixels. Another
00:32:02 thing is that this model is extremely aspect ratio dependent. So if you want to generate your images
00:32:10 with a certain aspect ratio, then you should prepare your images with that aspect ratio. What I
00:32:17 mean by that? For example, currently my images are all square like this. You see like this. However,
00:32:24 if I want 16:9 aspect ratio to generate images after training, then I should set my aspect ratio
00:32:32 according to like this. So if I use this image, this aspect ratio for training instead of square
00:32:38 aspect ratio, this model will able to generate images better in that aspect ratio. So therefore,
00:32:45 I recommend you to prepare your images with your desired aspect ratio after training. So
00:32:51 you see this generation was square; therefore, it is much more natural and accurate compared
00:32:56 to the 16:9 aspect ratio. I had to generate a lot of images to get accurately looking 16:9 aspect
00:33:04 ratio images. So decide your aspect ratio, whichever the aspect ratio that you want to
00:33:09 generate your images after training, and based on that aspect ratio, prepare your training images.
00:33:16 Okay, once the training images are ready, let's go back to New Job, Show Advanced,
00:33:21 copy the configuration, Show Simple. Now, training name is the final file names that you are going to
00:33:29 get after training. Since I have done previously a training, I will use this name so it will generate
00:33:34 names like this so that I can show you how to find the best checkpoint. And you will see that our
00:33:40 configuration is already using Version 2 adapter. This may get updated later because the Ostris is
00:33:48 working on adapter. And what is this? Because this model is turbo model, it is distilled model. Until
00:33:56 the Alibaba team publishes the main model, we are using distilled turbo model. So to not break the
00:34:04 distillation of the model, we are using a trick. This adapter model is merged with the turbo model
00:34:10 during training automatically. We don't do that. So that it behaves like a main model. So therefore
00:34:16 our trained LoRA is not breaking the turbo model. This is not the highest quality. This is not the
00:34:21 quality that we are going to get from the base model, but it is working. And we don't change
00:34:25 this. It will download it into Hugging Face cache. You don't change anything here. What
00:34:30 you can change here is that save every N steps. So this trainer doesn't work with epoch based;
00:34:37 it works with the steps based. Therefore, I recommend you to calculate your number of
00:34:43 steps based number of the training images. I have 28 images, so I recommend 200 epochs for
00:34:49 this number of images. That makes 5,600 steps. And if you want 10 checkpoints, make this 560. If you
00:34:56 want 8 checkpoints, make it 800 steps. So based on these steps, it will save those checkpoints.
00:35:03 And max steps saves to keep. So let's, if I make this 3, it will keep only 3, and it will delete
00:35:10 the previous ones. This is how it works. So I will keep all of them. And you don't change anything
00:35:15 here. They are all set. In the dataset, you select your training dataset. This is my dataset. And
00:35:21 you don't change anything else here from the configuration. And that's it. You can also have
00:35:27 samples during generation. Currently I did set it 250,000 steps, so they are never generated.
00:35:33 But if you want them to be generated like every 100 steps or maybe 200 steps, you can generate
00:35:39 them and you can see them if they are good or not. But I am not doing that. It is up to you.
00:35:46 Moreover, there is skip first sample, which generates the samples just when begin the
00:35:53 training. I'm not using that either, but you can disable this and it will generate samples. And
00:35:59 there is trigger word. I am only training with a trigger word right now. Since I don't provide any
00:36:05 caption in my dataset, it is trained entirely with this. But if you also provide captions
00:36:12 with your dataset like Test 1 caption, then it will append the trigger word. I think it
00:36:18 is appended into beginning. So it will become like "ohwx test 1" during the training. However,
00:36:23 I don't recommend the captions. And you see we are losing our configuration. This is annoying,
00:36:30 I know that. The AI Toolkit must get save and load configuration. Okay,
00:36:36 I need to make the file name again like this. Okay, and this was 700 steps. This was 5,600.
00:36:46 Okay. So I have compared all this. I have tested all of them for you. I have done so many grid
00:36:52 tests to find everything. I did so many trainings. For example, let me show you some of the trainings
00:36:59 that I have made. You see all these different parameters I have tested. I have tested 1024,
00:37:06 1280, combination of the different resolutions like these 3. And the best yielding resolution
00:37:14 for this model is 1536. However, if you want to speed up your training,
00:37:19 if this becomes too slow for you, then you need to disable this and use like 1280 or
00:37:25 just use 1024. Or you can enable all 3. Too much speed comes with 1024. But the
00:37:31 best quality is with 1536 for the Z-Image Turbo model. And let's select our training dataset. I
00:37:39 am also not using any caption dropout. So these are all my settings. And then click Create Job.
00:37:46 Now this training is queued. Not automatically started. So you can either click from here
00:37:52 to start or you can go to Training Queue and click the play icon from here. And you see it
00:37:58 shows Queued, then you click start. And it will start the queue processing. Let's see what is
00:38:04 happening in our VRAM. Currently I'm recording video with my second GPU, so my initial GPU is
00:38:11 empty. I just need to close the SwarmUI; it will become zero. Yes. So you can see the speed. I
00:38:16 have tested the speed. It is same on Windows and Linux. This is surprising, but maybe it is good,
00:38:23 I don't know. Because I am not sure whether it is fully utilizing the GPU or not. I don't see it is
00:38:28 fully utilizing on Windows. But it is fairly fast. And I can speed up the training significantly with
00:38:34 dropping the resolution. So when you click this icon, it will show you the training window like
00:38:39 this. You don't see the training on the started CMD window. You will see it from here. When you
00:38:44 refresh this page, let's go to Dashboard, you need to click this icon to see it. Don't forget
00:38:49 that. So it will show you some of the statistics, and it will show you what is happening. This is
00:38:54 actually what is happening on the training. This is the window that you are going to follow. It
00:38:59 will first cache then unload the text encoder. I did set everything for you. These parameters
00:39:05 are really optimal. Let's say you have 100 images as a training dataset. Then set it 10,000 steps.
00:39:11 I don't know how many steps you can do until the model breaks, but I can say that up to 10,000
00:39:18 steps you are safe. Maybe even 20,000 steps. You need to test. And since we are saving checkpoints,
00:39:23 we will compare. The checkpoints will appear here. This is very useful because on RunPod and
00:39:29 MassedCompute, you can directly download them from here. They will appear here. They will be
00:39:35 saved inside the folder. So it is started. But to get checkpoints quickly, let me show you how they
00:39:43 will appear. So I will pause this and you see it says that you can resume. But you can resume from
00:39:49 the last checkpoint. Therefore if there weren't any checkpoints, it will start from the beginning.
00:39:55 However, if you had checkpoints, it will resume from the last checkpoint. This is how it works.
00:40:00 So I will modify and I will save checkpoints every 5 steps. So let's make this every 5 steps. Update
00:40:08 Job. Then click Play. And it will start again. So I will start getting checkpoints here. Therefore
00:40:14 I can show you how they are appearing. When I start it again, it will not cache the images and
00:40:22 text captions again since they were already cached. So where they are stored? When you
00:40:27 go back into your AI Toolkit installation, in the output, your trainings will be saved here.
00:40:33 When I enter inside this training, you see they are named with the same name as the name that
00:40:39 I have set or the name I set here. So they will be saved with that name. It shows the log, PID,
00:40:46 and it will save the checkpoints here. We will see in a moment. You can also set the checkpoints from
00:40:52 the settings. You can set the training folder path. So you see, this is the path where it
00:40:57 will save the checkpoints. You can change this and it will save the checkpoints there. And you
00:41:03 see this is the datasets folder. So you can change both of these and save settings. Let's go back to
00:41:08 Dashboard. Let's see our training. So we see the steps. It will start from the first step. Okay,
00:41:14 it started. It is doing the steps. The speed is 6.67 seconds for RTX 5090 with the very
00:41:22 high quality, highest quality. Okay, we got the first checkpoint. So it is appeared here.
00:41:27 When I click this download, it will download. So you can use this in RunPod or MassedCompute. And
00:41:32 when I go back to outputs, I will see the checkpoint here. So it is saved like this.
00:41:37 Now I need to move this into my LoRA folder and test it. And you see it also saved an
00:41:44 optimizer PT file. So now that I can pause and continue. So let's get the next checkpoint. Yes,
00:41:51 it is saved here. Now I will pause it and I will resume it. Let's see how it continues. Okay,
00:41:57 click play again. Now it should resume from the last checkpoint which was 10 steps. Let's see what
00:42:03 happens. So the speed was 6.7 seconds. I also shown the speeds in experiment speeds folder.
00:42:09 You can see the speeds of different resolutions: 1024, 1280, combination mixed, 1536. These are all
00:42:18 speeds of the RTX 4090. Not 5090. This is 5090 speed. And this is RTX 3060 speed. So we have
00:42:26 the speeds. Okay, nice. So it is continuing from the last checkpoint which was 10 steps.
00:42:32 So this is how you can continue your training with AI Toolkit. These things are not specific
00:42:38 to the Z-Image Turbo model. With any model these are applying. But for each model,
00:42:45 you need to research a new configuration. That is important. If you use the default values,
00:42:50 you probably won't get the best results. Okay, so this is the speed. I can increase the speed
00:42:54 with reducing the resolution. And what are the differences between the default versus our best
00:43:01 training? I have it. So let me show you. This is default configuration, which the Ostris shown
00:43:09 in his tutorial, and this is our config. This is default; this is our config. Default, our config.
00:43:15 Default, our config. Our configuration yields much better results. Default, which the Ostris shown,
00:43:34 our config. You see there is a massive quality difference between default versus ours. And
00:43:38 these are not cherry-picked images; these are grid generations. Default, our config. Default,
00:43:44 our config. There is a huge difference between default and our config. Default, our config.
00:43:48 Default, our config. You see there is 2 persons because we are generating in high resolution. So
00:43:54 we need to generate just more images to get. Default, our config. Another prompt. Default,
00:43:59 our config. There is a massive difference. Default versus our config. Default,
00:44:04 our config. And these are grid images. And you see it even learned my broken teeth. I have a
00:44:10 broken teeth here. Maybe you noticed that. It learned it slightly. And this is a turbo model,
00:44:16 not like a base model. So this is pretty good, pretty accurate. So this is how we train.
00:44:21 After training, how you are going to test them? It is same as our other tests that
00:44:27 I have shown in other tutorials. So you will have checkpoints like this in the output folder
00:44:32 once the training finished. Move them into your LoRA folder. So I have them in my LoRA folder.
00:44:39 Then start your SwarmUI after put them in. Or if you are already running, it is fine. Go to LoRAs,
00:44:46 refresh. Then let's reset params to default. Let's go to presets. Select our preset again.
00:44:52 Direct apply. And go to Tools and select the Grid Generator. Let's say Test 1,
00:44:58 whatever the name you want. From here select LoRA. Type your LoRA name. My LoRA name is like this at
00:45:04 all. So it adds all the LoRAs. The last one goes here. So it is the last checkpoint. So they are
00:45:10 also sorted like this. Then the prompt. You can use any prompt you want like prompt like this.
00:45:15 To separate prompts use this character. So each prompt will be different. However, this is not a
00:45:21 proper prompt. So I am going to use the example prompt which I have provided in the zip file.
00:45:27 When you go here you will see Test Grid Prompts and Grid Format. Copy this. You can change this
00:45:32 according to your training. And generate grid. Now this will generate a grid for me based on these
00:45:39 LoRA checkpoints so I can see them. So let's go to here and see that in real time. Okay. So from
00:45:46 here. Okay, LoRA prompt. This is true. Sometimes you need to play with this to see. As the images
00:45:52 generated, they will appear. Do we have an error somewhere? Why did it? Okay, we have forgotten
00:45:58 reset params to default. Therefore the generation failed. You see because the ControlNet is open.
00:46:04 So always reset params to default. Don't forget that. Okay. Then let's select the preset one more
00:46:11 time. Direct apply. Let's go back to Tools and generate grid. Because otherwise you will get
00:46:16 error as I just had because the ControlNet was enabled. Now you just need to wait for
00:46:22 processing to be finished. And we will be able to compare the grid, the quality. So this is first.
00:46:28 As you can see on 5090 it is pretty fast. Every image taking like 8 seconds. I don't need to wait
00:46:34 more. But you can see that it is very undertrained in early steps. And it will get better trained up
00:46:40 to the last steps. We will see it. Even early steps there is some resemblance. I prefer this
00:46:46 over generating samples during the training. It's a choice. I find this better because I
00:46:52 don't lose time with the training process. And this is the most proper way of testing in my
00:46:58 opinion. Not using the samples generated during the training. Sometimes they may be inaccurate.
00:47:03 Okay, so the grid generation has been completed. Let's refresh. Now compare the checkpoints. 700
00:47:09 steps, 1400 steps, 2100 steps. So you see this way it goes. Decide which one is best for you. I can
00:47:18 say that the last one is the best. So you see this is very best. If you can't decide based on this,
00:47:24 what you can do is you can make this Test 2 and generate another grid. So this way compare until
00:47:31 you decide which one is working best. Moreover, trained LoRAs working with the ControlNet Union as
00:47:38 well. The only thing is that set your ControlNet strength to 0.6, so it is 60 percentage. And then
00:47:45 type your prompt. With just this simple prompt, "Photo of ohwx man wearing an amazing suit",
00:47:51 I am able to get amazing quality images with my trained LoRA by using this reference image as a
00:47:59 ControlNet. So it is fully working same way as using the base model with our trained LoRAs.
00:48:05 Now I will show you how to install and use Ostris AI Toolkit on RunPod. Then
00:48:10 on MassedCompute. To use on RunPod, always follow RunPod instructions txt file that I
00:48:17 have. Always. I have this file in all of my applications. For RunPod and MassedCompute,
00:48:23 always follow them. So let's open this. First of all, please register RunPod from this link.
00:48:28 I appreciate that. This enables me to do more research on RunPod. This helps me significantly
00:48:36 because these trainings are using huge amount of money. So you see I have spent 90 dollars on
00:48:42 a single day for Z-Image research. And then 10 dollars. Once you are here, go to Billing and set
00:48:50 some credits. Pay with your card, whatever. Then go to Pods. You can also use permanent storage,
00:48:57 which I use it. I also have a dedicated tutorial for that. So you see we have RunPod permanent
00:49:04 network storage tutorial. But I will show on a single pod this time. I recommend you to limit
00:49:10 your region to the US starting from bottom. These are the best GPUs. For this Ostris AI Toolkit,
00:49:18 you can use RTX 4090. This is most performant price option. If you want more speed, you can use
00:49:25 RTX 5090. The bigger ones are useless because it fits into VRAM. So let's see where we have. Okay,
00:49:31 we have it here. Moreover, from additional filters, select 100 RAM and NVME disk. Okay,
00:49:38 we don't have it. So we need to check again. US NC2, NC1, MO2, MO1, MD1, KS. Okay, here. We have
00:49:50 it. Then change this template to whatever template the instructions file tells you. The instructions
00:49:57 is telling this. Then you need to select this. If you get an error here, like when I select 5090,
00:50:03 it will tell me that you need to use this. This is wrong. Why? Because I am installing
00:50:09 applications into a virtual environment. I am not using the template environment. That is why. So
00:50:15 don't believe whatever the RunPod is telling you. Use the template that I write in my instructions.
00:50:24 So we are going to use this official PyTorch 2.2.0 template. This is super fast. Then click this edit
00:50:32 template and add an port here: 8675. This is super important. Otherwise you won't be able to connect
00:50:41 the Ostris AI Toolkit interface. And set your volume disk according to your, you know, needs.
00:50:48 If you are going to get too many checkpoints, if you are going to train bigger model,
00:50:51 you need bigger. But for Z-Image Turbo model, 200 is sufficient. And deploy on demand. Since this
00:50:58 is using official template, it will be super fast to initialize. Sometimes it doesn't show here, so
00:51:04 refresh to see the status. If it gets initialized or not. It should be very fast initialized. Okay,
00:51:11 details, telemetry, refresh. Okay, it is initialized. So it took like 20 or 30 seconds
00:51:18 because this template is very lightweight. Then click Jupyter Lab. Sometimes Jupyter Lab may also
00:51:23 not get loaded. You need to refresh. If it doesn't get loaded, delete the machine and get a new one.
00:51:28 First of all, verify that its GPU is working. pip install nvitop. Then type nvitop. And you need to
00:51:36 see your GPU like this. Otherwise just delete the pod and move to new one. Then upload the zip file
00:51:43 into here like this. This is important. Wait for upload to be completed. It is uploading in
00:51:48 the bottom as you see. Then right click and extract archive. Then click refresh. Okay,
00:51:53 it is extracted. Open the RunPod instructions read txt file. Copy this command. Open a new terminal.
00:52:01 Paste it and hit enter. This is it. This will install everything fully automatically. Including
00:52:06 the Node.js or whatever libraries. You don't need to do anything else. Once the installation has
00:52:12 been completed, we are going to use this to start the application. If you restart your pod again, if
00:52:19 you want to use again, you just need to run this command. But since this is first time, we just
00:52:24 need this to start once the installation has been completed. The installation speed totally depends
00:52:29 on the pod that you got. If it is a fast pod, if you are lucky enough, it will be fast. Otherwise
00:52:35 it will take time. But since we have made some filtering, we have selected from a certain region,
00:52:41 I can say that this pod is fast. There are major advantages of my installers on RunPod
00:52:47 instead of using a RunPod template. It always installs the latest version of the AI Toolkit.
00:52:55 It supports all of the GPUs that are available on RunPod. Not certain type of GPUs. So installation
00:53:02 of latest version and this GPU support makes it much more advantageous because you always use the
00:53:09 latest version of the application. Moreover the installation is fairly fast since it is
00:53:14 extremely optimized by myself. So the installation is getting completed. You can ignore these warning
00:53:21 messages as I have also explained in the Windows tutorial part. You need to watch Windows tutorial
00:53:27 part to learn it. Moreover if you don't know how to install and use SwarmUI and ComfyUI on RunPod I
00:53:34 have an excellent up to date tutorial. The link will be in the description of the video. This
00:53:39 tutorial. So watch it to learn how to use SwarmUI and ComfyUI on RunPod. I won't explain that part
00:53:43 in this video. I will only show Ostris AI Toolkit usage on RunPod in this tutorial. So watch this
00:53:50 tutorial to learn how to use SwarmUI and ComfyUI on RunPod. The link will be in the description
00:53:55 of the video. Okay installation almost completed. All right the installation has been completed. Now
00:54:01 we will run this starting command terminal. Copy paste the starting command. The starting should
00:54:07 be fairly fast. You see it also give us a network link like this but it is not working in RunPod. So
00:54:16 we need to connect from RunPod proxy. So go back to your My Pods click and click HTTP service 8675.
00:54:25 It will open the AI Toolkit. And we got the interface. The rest is exactly same. First
00:54:31 of all make your dataset. You can also upload your dataset into AI Toolkit Datasets folder
00:54:38 or you can click dataset my dataset like this create. Then click add images and you can drag
00:54:45 and drop the files as I have shown in the Windows tutorial part. It will upload them to the RunPod.
00:54:52 We will see the dataset here. Yes. You see the datasets. My dataset. Exactly same. The upload
00:54:59 will take some time because the RunPod is slow. They will appear here once processed. Okay. Let's
00:55:07 see what's happening. If it doesn't work you can also drag and drop them to here like this. It will
00:55:12 upload from the Jupyter Lab interface. Then you can refresh to get it. Yes it is uploading. So
00:55:19 let's use this way. Either way should work. We can just refresh. So you see the data set
00:55:25 images will appear here. Then click new job, show advanced. Again, same as in the Windows tutorial
00:55:32 part. Let's select our config, like this one, copy-paste, show simple, select your data set.
00:55:38 I'm not going to repeat the Windows tutorial part. Create job and click training. So that we can see
00:55:44 the training on RunPod. It should be fairly fast. First, it will download the necessary model,
00:55:50 then it will start the training. Let's just wait and see the logs. So you see, it is downloading
00:55:56 model into our workspace, fairly fast. We can see the speed. Okay, so the training has been started.
00:56:03 You can see the step speeds here. It will also show the step speed here after a while. Currently
00:56:09 like 8 seconds IT. You need to wait a little bit. It is using the GPU 100 percentage. The memory
00:56:14 usage is around 90 percentage. So RTX 4090 is very good on RunPod as a price performance. Therefore,
00:56:24 I recommend it. If you want faster, go with the RTX 5090. It's a little bit faster. Again,
00:56:30 as I have shown in the Windows tutorial part, you can look at the speeds. These four are for RTX
00:56:36 4090. This is RTX 5090, and this is RTX 3060. So the speed is decent. It will take like 13 hours.
00:56:42 Maybe it will take lesser, but let's say 13 hours. The cost would be like $10, maybe lesser than $10,
00:56:50 0.62 with 13, like $8. If you want faster, as I have shown in the tutorial, just change the
00:56:59 resolution. It will become four times faster. It will take four times lesser. That is the strategy,
00:57:05 but it will lower the quality. I don't recommend it. As I have shown in Windows tutorial part,
00:57:10 it will lower the quality significantly. Then you will get the checkpoints here, so you can download
00:57:15 them from here or from my pods, go back to AI toolkit outputs, they will appear inside here,
00:57:23 so you can download them from here too. That is it. Then you can terminate your pod,
00:57:28 stop your pod. These are the same. If you want to stop your pod after training has been finished,
00:57:35 we also have a command for it. So you see, this will stop your pod. How to do it? You need to get
00:57:41 your pod ID and paste it here. So this is seconds. Let's stop our pod in 20 seconds. So copy this,
00:57:49 open a new terminal, paste it. Now in 20 seconds, we will see that our pod has been stopped. Let's
00:57:57 see. Okay. Okay, it should be any second. I didn't count. We will see the command has been executed.
00:58:05 This way, you can sleep. Okay, it is stopped. So now when I refresh this page, I should see it is
00:58:11 stopped. Yes. So it won't spend my money. If you have any questions, you can ask me.
00:58:17 Now, the MassedCompute part will begin. Okay, now I will show the MassedCompute part. For
00:58:24 MassedCompute part, we are going to follow MassedCompute instructions. This is same
00:58:29 for all of my applications. Always follow the instructions.txt file. Please use this link
00:58:35 to register MassedCompute. I appreciate that. Login your account after registration, go back
00:58:40 to billing and add some credits. Once you have the credits, go to deploy. For Z Image Turbo version,
00:58:49 my recommended GPU is L40S. But you see, all of them is currently occupied, and they are hopefully
00:58:57 going to add new GPUs soon, they told me. So what can we use alternatively? We can use RTX 6000 ADA,
00:59:06 but they are also all full. Yes, there are no RTX 6000 ADA GPU. Therefore, to the cheapest one,
00:59:13 which would take time, we can use RTX A6000 premium. This is the cheapest one. If you want
00:59:19 speed, you can use A100 or H100. So let's go with the cheapest option, RTX A6000. So let's select
00:59:27 the category creator, select the image SE Courses. So you see, currently this is $0.56 per hour.
00:59:34 We are going to apply our coupon, SE Courses, verify, and it's only 42 cents. Deploy. You see,
00:59:41 I have selected premium version. This premium version is the best one. It has the most RAM
00:59:46 memory. Therefore, I recommend you to pick this one if you are going to use RTX A6000. However,
00:59:53 my recommended GPU, as I said, for Z Image Turbo version, L40S or RTX Pro 6000 if they
01:00:01 are available. If they are not available, RTX 6000 ADA, this GPU. If none of them is available,
01:00:08 you can use A100, H100, depending on your budget, or RTX A6000 and the premium version. Now we need
01:00:16 to wait for the initialization. When you click the running instance, when you refresh this page,
01:00:22 you will see it. Wait for initialization to be completed. Meanwhile waiting for initialization,
01:00:28 click details, and you see there is ThinLinc client. If you haven't installed it yet,
01:00:34 we are going to use it. Download according to your platform. I am on Windows, so let's download this.
01:00:40 Let's start it. Yes, next, accept, select the options, run ThinLinc client. Once the ThinLinc
01:00:47 client started, click options, go to local devices. Just have clipboard synchronization
01:00:52 and drivers enabled. Click drivers details and add a folder from your computer like this one.
01:00:59 You see there is add and remove, or you can copy-paste the path from here. Make sure that
01:01:03 it has read and write permission. Click okay and click okay. Then you just need to wait for
01:01:08 initialization to be completed. Sometimes refresh the page to be sure. Okay, the machine has been
01:01:14 initialized. Before connecting it, I recommend you to put your training images, let's copy it, into
01:01:21 your shared folder. So my shared folder is here. Copy-paste them there. Moreover, also copy the
01:01:29 downloaded installation zip file into your shared folder. Then you are ready. Then click this IP,
01:01:36 it is copied. Copy-paste it here. You see there is username. Copy the username, copy-paste here,
01:01:42 and user password. You cannot transfer big files with ThinLinc client. You need to use like Google
01:01:49 Drive, OneDrive, or Hugging Face. We have Hugging Face upload and download notebook as well. So
01:01:55 this is only for small files, like your training images or like installation zip files. Remember
01:02:01 this. The big files will be very slow or will not work. Once you are in this screen, go to home, go
01:02:08 to Thin Drives, MassedCompute shared folder, wait for synchronization to be completed. Sometimes it
01:02:15 can take time depending on your internet. Okay. Then select your installation zip file, drag and
01:02:22 drop it into downloads folder. Moreover, drag and drop your training images as well. This is
01:02:29 not mandatory. We will be able to upload from the interface as in the Windows tutorial part, but you
01:02:35 can have it. Then extract the installation in the downloads folder. Do not run anything in ThinLinc
01:02:42 client driver, in the shared folder. Always copy files into downloads. Enter inside the folder,
01:02:48 double-click MassedCompute instructions, copy this installation command, click these three dots icon,
01:02:55 open in terminal inside this extracted folder, right-click and paste. This will do the entire
01:03:02 installation of the AI toolkit on MassedCompute. Now just wait for installation. This will be
01:03:07 really fast compared to the RunPod. MassedCompute is super fast. Then meanwhile it is installing,
01:03:14 copy-pasting the training files during the installation will give you speed up, will reduce
01:03:20 your timing. So that's an advantage. But you see, this ThinLinc client for transferring files is
01:03:28 very slow. It is really, really slow. Therefore, for big files, you need to use like Google Drive,
01:03:34 OneDrive, big cloud services like Hugging Face. Okay, the training images have been copied.
01:03:40 So while installing the AI toolkit, I will copy this, enter inside the AI toolkit, enter inside AI
01:03:47 toolkit, and here, make a new folder named as data sets because it is not automatically generated.
01:03:55 Copy-paste it here. So when we start the application, our data set will be ready. Still,
01:04:00 as in the Windows tutorial part, you can use the interface to upload from these data sets,
01:04:06 new data set, type your data set like test, and you will be able to upload from this interface
01:04:12 as well. Okay, installation is continuing. When you get this window, just click cancel. Moreover,
01:04:19 when you start Google Chrome, it may ask you something as login or something. Okay, it didn't
01:04:25 ask. If you get that error, just click cancel. So you don't need to update software installed on
01:04:32 MassedCompute. Just click cancel to all of them. Moreover, I won't show you how to use SwarmUI on
01:04:38 MassedCompute because we have fully up-to-date tutorial for MassedCompute. You see this one,
01:04:44 ComfyUI and SwarmUI on cloud GPUs tutorial. The link will be in the description of the video.
01:04:48 So watch this to learn that part. I will just show how to use AI toolkit on MassedCompute,
01:04:55 not the how to use SwarmUI and ComfyUI and do the grid generation or other stuff as I
01:05:01 have shown in the Windows tutorial part. The biggest advantage of my installer is that it
01:05:06 always installs the latest version of the AI toolkit trainer. Moreover, it supports all of
01:05:12 the GPUs with the latest pre-compiled drivers with flash attention, xformers, sage attention,
01:05:18 Torch version, CUDA version. Therefore, my installers are really better than using the
01:05:24 templates. So during the Node.js installation, it is all automatic. You may get some warnings, just
01:05:30 ignore them because it will work. My installer is fully optimized and made it so easy that you
01:05:37 just run these two lines of command. It handles everything, all the setup for you. Okay, so the
01:05:43 installation has been completed. You can scroll up to see if there are any errors or not. Then
01:05:49 return back to your folder, open the MassedCompute instructions txt file again, and copy this part.
01:05:57 This is for starting. Then open three dots here, open in terminal, copy-paste it. We always run the
01:06:03 commands inside installed folder. This is super important. So it has been started. You can either
01:06:10 use the local link like this, or if you want to connect from your computer, which I recommend,
01:06:15 open link like this. So you see this is public link, and now I can connect from my computer. So
01:06:22 let's see. It says, yes, it says it is not secure. Continue site. This is totally fine. And now,
01:06:28 yes. So you see it is running in MassedCompute, but I am connected from my computer. The data
01:06:34 set will be here since I have copy-pasted it, or I can click copy data set. I can type GG,
01:06:40 create, then I can add images. I can drag and drop images from here to upload. However, copy-pasting
01:06:47 from the disk is better than here in my opinion. Okay, let's refresh. We don't need it. Then click
01:06:54 new job as in the Windows tutorial part, show advanced, select the configuration from the zip
01:06:59 file inside Z Image Turbo Lora configs. So since this GPU is 24 GB, copy-paste it, show simple,
01:07:06 give a name to your training, whatever you want, and then in the data set, select it.
01:07:12 As in the Windows tutorial part, you need to set your save every N steps and step count.
01:07:17 Watch the Windows tutorial part. Don't skip it. Then create job and then click play. So it will
01:07:22 first download the necessary models, then it will start training. Then the checkpoints will appear
01:07:28 here so that I will be able to download from here or in my machine in the AI toolkit installation,
01:07:37 they will be inside the output folder. So they will be inside here. To download from here,
01:07:42 I can use my notebook, my Jupyter Lab notebook, or you can use Google Drive, OneDrive, or you can
01:07:50 use the ThinLinc client. However, it would be very slow. So probably downloading from these
01:07:56 checkpoints from here will be the fastest way to download. Let's wait until the training begins so
01:08:02 we can see the speed. Okay, so the training has been started. It is like 18 seconds per IT. So
01:08:10 it will take 30 hours for 6,000 steps on this GPU. It is only 42 cents. Therefore, it would
01:08:17 take like $12. However, it is up to you. You can rent powerful GPU or you can use RunPod and 4090,
01:08:25 5090, or you can reduce the training resolution and speed up like four times, five times. It is
01:08:32 totally up to you what you want to do, but this is how you do it. And as the checkpoints generated,
01:08:38 they will appear here so that you can download and use on your local computer right away. This
01:08:43 is it. I hope you have enjoyed. Don't forget to delete your machine once you have saved
01:08:50 your generated checkpoints. Otherwise, if you use stop, it will not stop billing on MassedCompute.
01:08:56 And MassedCompute team told me that they will get a lot of new GPUs, hopefully very soon,
01:09:02 and maybe there will be permanent storage as well. We will see. Keep watching. Thank you so much.
Beta Was this translation helpful? Give feedback.
All reactions