Stable Diffusion Fix Hands Without ControlNet and Inpainting (Easy) | SDXL FREE! (Automatic1111)

Xclbr Xtra
23 Apr 202406:50

TLDRIn this tutorial, the presenter demonstrates how to generate realistic hand images using the Stable Diffusion model without resorting to complex techniques like ControlNet or inpaintings. The process involves using two models: the Real V SDXL for initial generation and the Dream Shipper Turbo for enhancement. The video offers a straightforward method to achieve decent hand poses without extra fingers or disfigurements. The presenter also shares tips on settings, such as using a mid-journey mimic for aesthetic appeal, adjusting the CFG scale, and enabling features like self-attention guidance for better results. The final output is a more realistic image, suitable for most use cases that do not require professional-level perfection.


  • 🎨 The video demonstrates a method to generate realistic hands using the Stable Diffusion model without relying on ControlNet or inpainting techniques.
  • 🤲 The Real V SDXL model is highlighted for its ability to produce decent hands, but improvements can be made for more realism.
  • 🚀 Two models are used in the process: one for the initial generation and another for upscaling to achieve better detail and realism.
  • 🌟 The 'mid Journey mimic' setting at 0.5 is recommended for an aesthetic look without being too strong.
  • 📈 The initial model uses 50 sampling steps with DPM++ 3M, SD exponential, and a CFG scale of 6 to 7 for generating the image.
  • 🔍 Negative prompting is used to avoid NSFW content, blurriness, and poorly drawn hands, but no specific prompt for hands is mentioned.
  • 🖼️ The generated image is then sent to an image-to-image model called 'dream shipper turbo' for further enhancement.
  • 🔧 The turbo model settings include eight steps, S3 cara scale increased to two, and a batch count of four for optimal results.
  • 🧩 The CFG scale for the turbo model is set to one, which is found to be sufficient for achieving realism.
  • ✨ The final images have improved hand details, with most hands appearing realistic, although some may still require manual adjustment.
  • ✍️ For non-professional use or quick improvements, the described process is considered adequate for most use cases.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is showing how to fix hands in Stable Diffusion without using ControlNet and inpainting, using a simple process.

  • Which model is suggested for generating decent hands?

    -The Real V SDXL model is suggested for generating decent hands.

  • What are the two models used in the process described in the video?

    -The two models used are the Real V SDXL model and the Dream Shipper Turbo model.

  • What is the purpose of using a detailer in the process?

    -The purpose of using a detailer is to enhance the details of the image after upscaling.

  • What is the significance of the 'mid Journey mimic' setting?

    -The 'mid Journey mimic' setting is used to give an aesthetic feel to the image at a 0.5 setting, preventing it from being too strong.

  • What are the sampling steps used for the Real V SDXL model?

    -50 sampling steps are used for the Real V SDXL model.

  • What is the role of the 'clip skip' setting in the process?

    -The 'clip skip' setting, when set to two, is believed to provide a little bit better details in the generated image, although it does not directly affect the hands.

  • How does the Dream Shipper Turbo model contribute to the final image?

    -The Dream Shipper Turbo model works on the image by denoising it to the strength provided and makes the skin look more realistic.

  • What is the recommended CFG scale for the Dream Shipper Turbo model?

    -The recommended CFG scale for the Dream Shipper Turbo model is one, as it is found to be good for the desired outcome.

  • What feature is enabled to ensure the best results from the models?

    -The 'Freu integrated' and 'Self-attention guidance integrated' features are enabled to ensure the best results.

  • What is the final step suggested to improve the image further?

    -The final step suggested to improve the image further is to use in-painting techniques.

  • What is the intended audience for this process?

    -The intended audience is users who are not using the process professionally and want a simple way to improve the realism of hands in generated images.



🎨 Creating Realistic Hands in Artwork

The first paragraph introduces the topic of creating realistic hands in digital art. The speaker emphasizes that while the process is straightforward and doesn't require complex techniques like control net or in-painting, it's an effective way to achieve normal poses with well-formed hands. The speaker mentions using the real V SDXL model for generating decent hands but seeks to enhance the realism. The video will demonstrate using two models to achieve this. The example provided is a fairy tale woman with long blue hair, and the speaker notes that full body shots are important to showcase the hands. The settings used for the initial model include 50 sampling steps and DPM Plus+ 3M, SD exponential, with a CFG scale of 6 to 7. The speaker also mentions enabling self-attention guidance integrated in Forge web UI and setting the clip skip to two for better detail.


🚀 Enhancing Image Realism with Turbo Models

The second paragraph discusses the process of enhancing the realism of the generated image using a turbo model. The speaker explains that they will upscale the initial image and then apply an A detailer for further refinement. The settings for the turbo model include eight steps, S3 cara scale increased to two, and a batch count of four. The speaker also adjusts the CFG scale to one, which they find to be optimal for the turbo model. The use of the A detailer and the turbo model results in a more realistic appearance, with the skin looking less plasticky. The speaker concludes by stating that while in-painting can further improve the image, the described process should suffice for most use cases that are not professional in nature.



💡Stable Diffusion

Stable Diffusion is a term referring to a type of machine learning model used for generating images from textual descriptions. In the context of the video, it is the core technology that the tutorial is built around, focusing on improving the generation of human hands in the images produced by the model.


The term 'hands' is central to the video's theme as it discusses methods to generate more realistic and properly formed hands in images created by the Stable Diffusion model. The video aims to address common issues such as disfigured or extra fingers in AI-generated images.


ControlNet is a technique used in AI image generation to control the output more precisely. The video mentions that the process described does not use ControlNet, indicating a simpler and more accessible method for generating images with proper hands.


Inpainting is a process in image editing where missing or damaged parts of an image are filled in. The video script notes that inpainting is not used in the described process, suggesting an alternative approach to achieving the desired image results.

💡Real V SDXL Model

The 'Real V SDXL Model' is a specific model within the Stable Diffusion framework that is mentioned as capable of generating decent hands but not with the desired level of realism. The video aims to enhance the output of this model.

💡Mid Journey Mimic

This term refers to a setting or feature within the image generation process that gives an aesthetic feel to the images. It is used at a 0.5 setting to avoid making the image too strong, indicating a balance between detail and the overall aesthetic.

💡Negative Prompting

Negative prompting is a technique used in AI image generation where the model is instructed to avoid including certain elements in the generated image. In the script, it is used to prevent NSFW (Not Safe For Work) content, blurriness, and poorly formed hands.

💡Sampling Steps

Sampling steps refer to the number of iterations the AI goes through to generate an image. The video mentions using 50 sampling steps with the DPM Plus+ 3M, SD exponential model to achieve a certain quality of image.

💡CFG Scale

CFG Scale stands for 'Control Flow Guidance Scale' and is a parameter that affects the level of detail and control in the image generation process. The video suggests a CFG scale between 6 to 7 for initial generation and 1 for the turbo model to achieve a more realistic look.

💡Image to Image

Image to image is a process where an existing image is used as a base to create a new image with modifications or enhancements. In the video, this process is used to refine the generated image and improve the realism of the hands.


Denoising is a technique used to reduce or remove unwanted details or noise from an image. The video discusses using a turbo model to denoise the image and make the skin appear more realistic.


The video demonstrates a simple method to generate proper hands without using ControlNet or inpaint techniques.

The process is suitable for creating normal poses with decent hands, avoiding extra fingers or disfigured hands.

The Real V SDXL model is used for its ability to generate decent hands, although it may lack realism.

Two models are utilized in the process to enhance the realism of the generated hands.

The video emphasizes that facial features are not as important as the full body, particularly the hands, in the generated images.

Mid Journey mimic at a 0.5 setting is used for an aesthetic feel, avoiding a too-strong appearance.

Negative prompting includes NSFW and blurry lores, with a focus on avoiding bad hands.

For the Real V model, 50 sampling steps and DPM Plus+ 3M SD exponential are used.

Batch count is set to two, with a CFG scale around 6 to 7 for the initial generation.

Self-attention guidance integrated is enabled for better results.

Clip skip is set to two for better detail in the generated images.

The generated images are then sent to an image-to-image process using the Dream Shipper Turbo model.

The settings for the Turbo model include eight steps, S3 caras scale increased to two, and a batch count of four.

CFG scale is set to one for the Turbo model, which is found to be effective for realism.

The AD tailor is enabled for additional customization.

The final images are compared, showing a more realistic appearance after using the Turbo model.

The process covers most use cases for non-professional use and provides a decent figure for hands in generated images.

In-painting can be used for further improvement of the generated images.