Run Stable Diffusion 3 Locally! | ComfyUI Tutorial

Markury AI
12 Jun 202403:48

TLDRIn this tutorial, the host demonstrates how to locally run Stable Diffusion 3 Medium using ComfyUI. The process involves accessing the Hugging Face repository, downloading necessary files like the sd3 medium safe tensors, text encoders, and workflows, and updating ComfyUI. After installing the models and checkpoints, the user can generate images using natural language prompts, showcasing the model's impressive capabilities. The video also addresses licensing issues and encourages community feedback to Stability AI.

Takeaways

  • ๐ŸŒŸ Stable Diffusion 3 Medium is a new model available for download from Hugging Face.
  • ๐Ÿ“ To access the model, you need to fill out a form and agree to the terms to gain access to the repository.
  • ๐Ÿ’พ Download the necessary files including 'sd3 medium.safetensors', text encoders like 'clip G clip L and T5 xx', and 'comfy UI workflows'.
  • ๐Ÿ› ๏ธ Update Comfy UI by running the 'update comfy ui.bat' script in the directory.
  • ๐Ÿ”„ Close any running instances of Comfy UI before updating to ensure compatibility with the new files.
  • ๐Ÿ“ Organize downloaded models into the appropriate folders within the Comfy UI directory structure.
  • ๐Ÿ” Install CLIP models into the 'clip' folder and place the 'sd3 medium.safetensors' file in the 'checkpoints' folder.
  • ๐Ÿš€ Start Comfy UI with the 'Nvidia GPU dobat' script to utilize the new model.
  • ๐Ÿ–Œ๏ธ Load the 'sd3 medium.safetensors' checkpoint and CLIP files to begin image generation.
  • ๐ŸŽจ The model generates images from natural language prompts, responding well to descriptive phrases.
  • ๐Ÿ“œ There is a community concern about the licensing, and it's suggested to open an issue with Stability AI for clarification or updates.

Q & A

  • What is the main topic of the tutorial video?

    -The main topic of the tutorial video is how to run Stable Diffusion 3 Medium locally using ComfyUI.

  • Where should one go to access the Stable Diffusion 3 Medium model?

    -To access the Stable Diffusion 3 Medium model, one should go to Hugging Face and fill out the form to gain access to the repository.

  • What files need to be downloaded from Hugging Face for this tutorial?

    -The files that need to be downloaded include the SD3 Medium safe tensors, text encoders such as CLIP G, CLIP L, and T5 XXL, all in the FP16 format.

  • What is the purpose of the ComfyUI update mentioned in the script?

    -The purpose of the ComfyUI update is to ensure compatibility with the new Stable Diffusion 3 Medium model.

  • How does one update ComfyUI according to the tutorial?

    -To update ComfyUI, one should navigate to the ComfyUI directory, go to the 'update' folder, and run the 'update_comfy_ui.bat' file.

  • What is the recommended first step after updating ComfyUI?

    -The first step after updating ComfyUI is to go back into the ComfyUI Windows Portable and navigate to the models folder to install the CLIP models.

  • Why is it suggested to create an 'sd3' folder in the checkpoints directory?

    -Creating an 'sd3' folder in the checkpoints directory helps to organize different models, as there may be more models released in the future.

  • What is the recommended workflow to use with the Stable Diffusion 3 Medium model?

    -The recommended workflow to use with the Stable Diffusion 3 Medium model is the 'basic inference workflow'.

  • How does the Stable Diffusion 3 Medium model interpret the prompt for image generation?

    -The Stable Diffusion 3 Medium model interprets the prompt using natural language, which is more effective than the boru tag style.

  • What is the issue mentioned regarding the licensing of the Stable Diffusion 3 Medium model?

    -The issue mentioned is that the licensing for the Stable Diffusion 3 Medium model is unclear or 'messed up,' and the community is encouraged to open an issue to request an update from Stability AI.

  • What is the final step to start using the Stable Diffusion 3 Medium model with ComfyUI?

    -The final step is to run the 'Nvidia GPU dobat' from the base directory and then load the checkpoint and CLIP files in ComfyUI to start image generation.

Outlines

00:00

๐Ÿš€ Introduction to Stable Diffusion 3 Medium

The video script begins with an introduction to the Stable Diffusion 3 Medium model, emphasizing its recent release and excitement around it. The host guides viewers on how to access and download the necessary files from Hugging Face, including the 'sd3 medium.safetensors', various text encoders like 'clip G clip L and T5 xx fp16', and the 'comfy UI workflows'. The process involves filling out a form for gated access, agreeing to terms, and navigating through the repository to download the required files.

๐Ÿ› ๏ธ Updating Comfy UI and Installing Models

The script proceeds with instructions on updating Comfy UI, which requires closing the application if it's already running. Viewers are directed to the Comfy UI directory to execute the 'update comfy ui.bat' file for the latest updates. After updating, the script details the process of installing new CLIP models into the 'clip' folder and placing the 'sd3 medium.safetensors' file into the 'checkpoints' folder, suggesting the creation of an 'sd3' folder if it doesn't exist. This step ensures that the Comfy UI is ready for the new models and workflows.

๐Ÿ–Œ๏ธ Generating Art with Stable Diffusion 3 Medium

The host demonstrates how to use the updated Comfy UI with the Stable Diffusion 3 Medium model. They explain the process of loading the new checkpoint and CLIP files, and then proceed to generate an image using a descriptive prompt provided by the model's developers. The example prompt describes a female character with long, flowing hair made of ethereal patterns resembling the Northern Lights. The script highlights the model's ability to understand and generate images from natural language prompts, showcasing the model's capabilities and the quality of the generated artwork.

๐Ÿ“ Final Thoughts and Community Involvement

In the concluding part of the script, the host expresses satisfaction with the release of the Stable Diffusion 3 Medium model's weights for free and encourages viewers to help with the licensing issues by opening issues or notifying Stability AI to update their license. The host calls for a community effort to address the licensing concerns and wraps up the tutorial with well-wishes for the viewers.

Mindmap

Keywords

Stable Diffusion 3

Stable Diffusion 3 is a term referring to a specific version of a generative model that uses deep learning to create images from textual descriptions. In the video, it is the main subject, and the tutorial demonstrates how to download and use this model for image generation. The script mentions downloading 'sd3 medium safe tensors,' which are part of the Stable Diffusion 3 model.

ComfyUI

ComfyUI is a user interface for running and managing AI models like Stable Diffusion. The script instructs viewers on updating and using ComfyUI to integrate the Stable Diffusion 3 model. It is presented as an essential tool for the process, highlighting its role in facilitating the use of AI for image generation.

Hugging Face

Hugging Face is a company that provides a platform for sharing machine learning models. In the context of the video, it is the source from which the Stable Diffusion 3 model and related files are downloaded. The script describes the process of accessing the Hugging Face repository and downloading necessary files.

Gated Model

A gated model refers to a model that is not freely available to the public and requires some form of access control, such as filling out a form or agreeing to terms and conditions. The script mentions that Stable Diffusion 3 is a gated model, indicating that users must go through a process to gain access to it.

Text Encoders

Text encoders are components of AI models that convert text into a format that can be understood by the model. In the script, 'clip G clip, L and T5 xx' are mentioned as text encoders that are part of the Stable Diffusion 3 setup process, essential for translating text prompts into image generation.

Checkpoints

In the context of machine learning, checkpoints are files that contain the state of a model at a certain point in time, allowing it to be saved and resumed. The video script describes placing the 'sd3 medium safe tensor' file into the checkpoints folder within the ComfyUI directory, which is a step in setting up the Stable Diffusion 3 model.

Nvidia GPU

Nvidia GPU refers to graphics processing units manufactured by Nvidia, which are commonly used for running AI models that require heavy computational power. The script mentions running 'Nvidia GPU dobat,' which is likely a batch file to initialize the GPU for processing the Stable Diffusion 3 model.

Inference Workflow

An inference workflow is a sequence of steps or processes used to perform inference with a machine learning model. The video script refers to using a 'basic inference workflow' for Stable Diffusion 3, which guides the user through the process of generating images from text prompts using the model.

Q Prompt

A Q prompt, or query prompt, is a text input given to an AI model to generate a specific output. In the video, the script provides an example Q prompt describing a 'female character with long flowing hair made of ethereal swirling patterns resembling the northern lights,' which the Stable Diffusion 3 model then uses to create an image.

Ethereal

Ethereal refers to something that is extremely delicate and light, often associated with a heavenly or spiritual quality. In the script, it describes the desired appearance of the generated image's hair, indicating that the model should create an otherworldly and beautiful visual effect.

Aurora Borealis

Aurora Borealis, also known as the northern lights, is a natural light display in the Earth's sky, predominantly seen in the high-latitude regions. The script uses 'Aurora Borealis' as a descriptor in the Q prompt to guide the Stable Diffusion 3 model to create an image with patterns resembling this natural phenomenon.

Highlights

Introduction to using Stable Diffusion 3 Medium and ComfyUI.

Accessing the gated model on Hugging Face by filling out a form.

Downloading necessary files such as sd3 medium safe tensors, text encoders, and workflows.

Instructions to update ComfyUI by running the 'update_comfy_ui.bat' script.

Installing CLIP models into the ComfyUI models directory.

Creating a new folder for sd3 medium safe tensors and adding the file to checkpoints.

Starting ComfyUI with the Nvidia GPU 'dobat' to utilize GPU capabilities.

Loading the sd3 medium safe tensors as the checkpoint in ComfyUI.

Using the CLIP files for text-to-image generation in ComfyUI.

Demonstration of generating an image with a natural language prompt.

The generated image features a female character with ethereal, aurora-like hair.

Comparison of the prompt style to Boru tag and its effectiveness.

Observation of the model's impressive generation capabilities.

Discussion on the model's licensing issues and the need for community feedback.

Encouragement for users to open issues or contact Stability AI about licensing.

Final thoughts and sign-off, wishing viewers a great day.