LORA + Checkpoint Model Training GUIDE - Get the BEST RESULTS super easy

Olivio Sarikas
10 May 202334:38

TLDRThe video provides a comprehensive guide on training LORA and checkpoint models to achieve high-quality results in AI image generation. The host emphasizes the importance of understanding the training process, selecting appropriate images, and using high-quality images for better AI comprehension. They discuss the significance of image size, the variety of expressions, fashion styles, and lighting conditions for training. The use of keywords in text files is also crucial for the AI to learn variations in styles and features. The video outlines the differences between LORA and full models, suggesting LORA for faces and models for more complex subjects. It also offers tips on training with star portraits for beginners, determining the number of images and epochs needed, and the use of tools like Google Images and Koya SS for the training process. The host shares a merging trick to improve model quality by combining it with a better model and highlights the use of higher resolution images for better training outcomes. The video concludes with a call to join the host's Discord for further assistance.


  • 🌟 **Discord Community**: Engage with a community for support and advice on training models in a specific Discord channel.
  • 🧠 **Understanding the Process**: Grasp how the training process works to select appropriate images and understand how the model interprets them.
  • 📷 **Image Selection**: Use a variety of images that showcase different expressions, fashion styles, and lighting situations to train the model comprehensively.
  • 🔍 **Image Quality**: Opt for high-quality, sharp images without blurriness or pixelation for better AI interpretation and training results.
  • 📚 **Keyword Importance**: Use descriptive keywords to help the AI learn and differentiate between various features and styles within the training images.
  • 🤖 **Choosing Between LORA and Model**: Decide whether to use a LORA (smaller, versatile) or a full model (larger, more consistent) based on the training goals.
  • 🎭 **Training on Star Portraits**: For beginners, training on star portraits can be advantageous due to the abundance of images and legal considerations for private research.
  • 📈 **Image Quantity and Quality**: The number of images needed depends on the complexity of the subject; higher complexity requires more images for adequate training.
  • 🔢 **Training Parameters**: Adjust the number of steps per image and epochs based on the number of images available and the desired training outcome.
  • 🖼️ **Image Size**: Use a minimum image size of 512x512 pixels, with larger images providing more detail for training but potentially slowing down the process.
  • 🛠️ **Tools and Software**: Utilize tools like Google Images, bug resize, and software like Koya SS for efficient image selection, resizing, and model training.

Q & A

  • What is the purpose of the Discord channel mentioned in the transcript?

    -The Discord channel serves as a community space where people can get help and exchange ideas about LORA and model training. It is filled with helpful people, and the speaker is also often present to assist.

  • How does the learning process of an AI image work?

    -The learning process involves taking an input photo and dissolving it into noise. The noise acts as a seed number used to recreate an image from the noise, aiming to make it as close as possible to the original input image.

  • Why is it important to have images of different sizes and expressions when training an AI?

    -Having a variety of images helps the AI learn to recognize and reconstruct faces and objects in different contexts, such as various facial expressions, fashion styles, and lighting situations. This diversity improves the AI's ability to generate images that match a wide range of prompts.

  • What are the benefits of using LORAs for training?

    -LORAs are smaller, can be applied to various models, and are efficient for training faces. They can be used in multiple prompts and are easier to store due to their smaller size compared to full models.

  • How does the size of an object in an image affect its training outcome?

    -The size of an object, such as a face, in the image affects how much of the noise it occupies during training. Smaller objects in the image will only occupy a small part of the noise, making it difficult to reconstruct them as larger parts of the image without losing detail.

  • What is the recommended image quality for training AI models?

    -High-quality images that are sharp, well-defined, and free from blurriness or pixelation are recommended. While high resolution can be beneficial, the main point is that details like eyelashes should be clearly distinguishable in the noise.

  • How do keywords in text files affect the training process?

    -Keywords act as variables that allow the AI to learn the differences between various styles, lengths, and colors of features like hair. Proper use of keywords enables variability and allows the AI to react to changes in these features.

  • What is the difference between training a LORA and a full model?

    -A LORA is a smaller, more focused add-on to other models, suitable for specific features like faces. A full model, or checkpoint, is larger and more consistent, making it easier to handle and more forgiving during training. It is suitable for themes like architecture.

  • Why is it suggested to train on images of a star for beginners?

    -Training on images of a star is beneficial for beginners because there is a wide variety of images available, making it easier to spot problems and test different keywords and situations. It is also legal for private research purposes in most countries.

  • What is the significance of the number of images and steps per epoch in training?

    -The number of images and steps per epoch depends on the complexity of the subject. For complex subjects, more images and steps are needed. For simpler subjects like a face, fewer images and steps can suffice. It's about creating enough situations for the AI to learn from.

  • How does image size affect the training process?

    -A minimum image size of 512 by 512 is recommended, with larger images providing more quality and details for the AI to train with. However, higher resolution images can slow down the training process due to increased GPU power requirements.

  • What is the suggested approach for resizing images for training?

    -Using a tool like 'Bulk Resize' to resize images in bulk is suggested. The longest side can be set to a value that suits the GPU's capabilities, and high-quality JPEG format is recommended for maintaining image quality.



🤖 Introduction to Training AI Models for High-Quality Results

The video begins with an introduction to training AI models, specifically LoRAs and models, for achieving impressive results. The speaker emphasizes the ease of obtaining good results with proper training and introduces a Discord channel for support and community interaction. The process of training is explained, where an input photo is transformed into noise and then reconstructed into a new image. The importance of selecting the right images for training is discussed, including images with different facial expressions, fashion styles, and lighting conditions. The video also touches on the challenges of training AI to recognize small objects like faces and the need for high-quality images for effective training.


🖼️ Image Selection and Quality for AI Training

The second paragraph delves into the specifics of image selection and quality for training AI models. It highlights the need for a variety of images that capture different emotions, fashion styles, and hairstyles. The importance of image quality is stressed, with a focus on sharpness and clarity to facilitate the AI's learning process. The paragraph also discusses the significance of using descriptive keywords in text files to enable variability and adaptability in the AI's training. The differences between LoRAs and models are explained, with LoRAs being smaller, versatile add-ons, and models being larger, more consistent, and suitable for complex themes like architecture.


📚 Training Details: Image Quantity, Steps, and Epochs

This paragraph addresses the number of images required for training, which depends on the complexity of the subject. It suggests that fewer images are needed for training faces due to their consistent structure, while more complex subjects like architectural styles require a larger dataset. The concept of steps and epochs in the training process is clarified, with steps being repetitions per image and epochs representing model generations. The paragraph also provides guidance on determining the number of steps per epoch based on the size of the dataset and the desired training depth.


📏 Image Size and Training Process Recommendations

The focus of this paragraph is on the optimal image size for training AI models, recommending a minimum size of 512x512 pixels. It advises against cropping images to a square ratio to avoid losing important training data. The paragraph also discusses the software's automatic creation of training buckets for different resolutions and ratios. The video suggests using high-resolution images for better training results, especially when upscaling, but cautions that higher resolutions can slow down the training process due to increased GPU power requirements.


🛠️ Tools and Techniques for Image Preparation and Training Setup

The speaker introduces various tools for image preparation, such as Google Images for sourcing and a tool called 'Bulk Resize' for resizing images. The paragraph outlines a recommended folder structure for organizing training materials and provides detailed instructions for installing and setting up the Koyasha software for model training. It also covers the installation of necessary components like Python, Git, and Visual Studio, and the activation of GPU acceleration for faster training.


📝 Captioning Images and Refining Keywords for Training

This paragraph emphasizes the importance of accurately captioning images and refining keywords to guide the AI training process. It introduces a tool called 'Boru Data Set Tag Manager' for managing and editing keywords. The video demonstrates how to use the tool to review and adjust keywords for each image, ensuring they align with the desired training outcomes. The paragraph also discusses selecting a base model for training, recommending the use of the Stable Diffusion 1.5 model for its suitability as a training source.


🔧 Setting Training Parameters and Merging Models for Enhanced Results

The final paragraph covers setting the training parameters in the Koyasha software, including batch size, number of epochs, and image resolution. It advises on troubleshooting potential issues like running out of VRAM and suggests remedies such as reducing the batch size or image resolution. The speaker shares a 'merge trick' for improving the trained model's performance by combining it with a more advanced model using the Checkpoint Merger tool. The video concludes with a call to join the speaker's Discord for further assistance and an invitation to like the video.




LORA (Low-rank Adaptation) is a technique used in machine learning to adapt a pre-trained model to a new task by modifying only a small part of it. In the context of the video, LORA is used to train AI models to generate images with specific characteristics, such as faces or styles, by applying it as an add-on to other models. It is particularly useful for training on faces due to its smaller size and flexibility.

💡Checkpoint Model

A Checkpoint Model refers to a version of a neural network that has been saved at a certain point during the training process. It allows for the resumption of training or inference at that point. In the video, the creator discusses training a Checkpoint Model for generating images, emphasizing that it is a larger and more consistent file, suitable for themes like architecture.

💡Training Process

The training process involves feeding a machine learning model with data so it can learn to perform a specific task. In the video, the training process is described as transforming an input image into noise and then teaching the model to reconstruct the image from that noise. This process is crucial for the model to learn and generate images that closely match the input.


In the context of the video, noise refers to the initial state of the input image before the model starts learning to reconstruct it. The noise is essentially a seed number used to initiate the learning process. It is important because the model's task is to transform this noise back into a coherent image that resembles the original input.

💡Image Quality

Image quality is a critical factor in training AI models. High-quality images that are sharp and well-defined are preferred, as they allow the AI to discern details more accurately. The video emphasizes the importance of using high-resolution, non-blurry, and uncompressed images for better training results.


Keywords are descriptive terms used in the training process to help the AI understand and associate specific features with the images. They act as variables that the AI uses to learn the different aspects of the images, such as hair style, color, or facial expression. Proper use of keywords is essential for the AI to generate images that match the desired characteristics.

💡Discord Channel

A Discord Channel is a communication platform within the Discord application where users can interact in real-time through text, voice, and video. In the video, the creator mentions a specific Discord channel dedicated to LORA and model training, which serves as a community resource for sharing knowledge and getting help.

💡Training Steps and Epochs

Training steps, also known as iterations, refer to the number of times the model processes each image during a single epoch. An epoch is a complete pass through the entire dataset. The video explains that using more epochs with fewer steps can lead to better results, as it allows for more iterations and improvements on the model.

💡Merging Trick

The merging trick is a technique mentioned in the video where a trained model is combined with another, more refined model to improve its performance. This is particularly useful when the initial model is not fully trained or does not produce high-quality results on its own. By merging, the weaknesses of the initial model can be compensated by the strengths of the secondary model.

💡Image Size

The size of the images used for training can affect the quality and speed of the training process. The video suggests a minimum size of 512x512 pixels and notes that larger images provide more detail for the AI to learn from. However, higher resolution images can slow down the training process due to increased computational demands.

💡GPU Power

GPU (Graphics Processing Unit) power refers to the computational ability of a GPU, which is particularly important for tasks like training AI models that require significant graphical and mathematical processing. The video touches on how GPU power can influence the training process, especially when dealing with high-resolution images.


The guide provides an easy method to achieve amazing results with LORA and Checkpoint Model Training.

Joining a specific Discord channel can offer helpful resources and community support for model training.

Understanding the training process is crucial for selecting the right images and enabling the model to comprehend them.

The importance of image size, especially for faces in the images, is emphasized for effective training.

Training on different emotions, expressions, and fashion styles helps the AI learn the variability in human features.

High-quality, non-blurry images are recommended for better definition and training results.

Keywords in text files act as variables, allowing the AI to learn differences in styles and make adjustments.

LORAs are smaller, versatile add-ons that can be applied to various models and are ideal for faces.

Models are larger files that are more consistent and forgiving, suitable for themes like architecture.

Training on star portraits is suggested for beginners due to the abundance of images and legal considerations.

The number of images needed depends on the complexity of the subject; faces require fewer images compared to styles with more variation.

Steps and epochs are training parameters that define the repetition and generational progress of the model.

A merge trick is introduced to improve model quality by combining it with a better model, even if not fully trained.

Image size should be a minimum of 512x512, with uncropped images preferred for more natural training data.

Using high-resolution images for training improves the quality of upscaled images.

The guide suggests tools for finding and resizing images, as well as organizing them for training.

Koya SS is recommended as an easy-to-use software with a large community for model training.

Captioning of image files is important for creating keyword text files that the AI uses for training.

The use of a tool like 'boru data set tag manager' can streamline the process of reviewing and editing keywords.

Experimenting with keywords is crucial for refining the model and achieving desired results.

The training process involves setting up the model, defining folders, and adjusting training parameters.

If initial training results are not satisfactory, the model can be merged with a better-performing one to improve outcomes.

The final step is to train the model and wait for the process to complete before using the trained model.