Stable Diffusion 3 HANDS ON! How Good Is It Really?

All Your Tech AI
18 Apr 202408:51

TLDRStability AI has recently launched Stable Diffusion 3 and its Turbo version, accessible only via API through a partnership with Fireworks AI. The company plans to make model weights available for self-hosting with a Stability AI membership soon. Despite the high API pricing, the beta version of Stable Diffusion 3 was successfully implemented on Pixel Doo within three hours, allowing users to generate images with prompts and optional negative prompts. The quality of the generated images is generally consistent with those displayed on the company's website, with prompt adherence being notably good. However, text generation within images remains a challenge. The Turbo model is faster but produces lower resolution images. The video concludes that Stable Diffusion 3 mostly lives up to its hype, with most images generated closely resembling the examples provided by the company.

Takeaways

  • 🚀 Stable Diffusion 3 and Stable Diffusion 3 Turbo have been released by Stability AI, but are only available via API.
  • 🤝 Stability AI has partnered with Fireworks AI, an API platform that provides hosting and fast access to models like Stable Diffusion.
  • 📚 They plan to make the model weights available for self-hosting with a Stability AI membership in the near future.
  • ⏱️ The reviewer had Stable Diffusion 3 beta up and running on Pixel Doo within 3 hours.
  • 💰 The pricing for the API is relatively high, with credits costing about $10 per thousand and image generation costs varying between the models.
  • 🔍 The reviewer tested the model's image generation without cherry-picking, using prompts from press releases to assess the quality.
  • 📸 The generated images were generally in line with the prompts and similar to those displayed on the Stability AI website.
  • 📝 Text coherence in images generated by Stable Diffusion 3 was a challenge, with some text appearing mashed up or incorrect.
  • 🔧 The reviewer noted that the Turbo model was quicker but resulted in lower quality images compared to the standard model.
  • 🎨 The adherence to complex prompts, including those with text, was generally good, although not perfect.
  • 📈 Stable Diffusion 3 seems to live up to the hype, with most images generated closely matching the examples on the website.
  • 💡 Negative prompts were not used in the tests, but they could be an option for users to experiment with for better results.

Q & A

  • What is the main difference between Stable Diffusion 3 and Stable Diffusion 3 Turbo?

    -Stable Diffusion 3 and Stable Diffusion 3 Turbo are both available via API, but the Turbo version is quicker to return results, although the quality might be lower compared to the standard model.

  • How can one access Stable Diffusion 3 and Stable Diffusion 3 Turbo?

    -They are accessible via an API provided by Fireworks AI, an API platform that offers hosting and fast access to these models.

  • What is the pricing structure for the API that hosts Stable Diffusion 3 models?

    -The API operates on a credit system where users need to purchase credits, with each image generated costing 6 to 12 credits, making it about 32 times more expensive than generating an image with Stable Diffusion XL 1.0.

  • What is the cost per thousand credits for using the Stable Diffusion 3 API?

    -The cost is approximately $10 per thousand credits.

  • What is the commitment Stability AI has towards open generative AI?

    -Stability AI has committed to making the model weights available for self-hosting to those with a Stability AI membership in the near future.

  • How long did it take to set up Stable Diffusion 3 beta on Pixel Doo after its release?

    -It took about 3 hours to set up Stable Diffusion 3 beta on Pixel Doo.

  • What are the options available to users when generating an image with Stable Diffusion 3?

    -Users can input a prompt, optionally provide a negative prompt, and choose between Stable Diffusion 3 and Stable Diffusion 3 Turbo.

  • How does the quality of images generated by Stable Diffusion 3 compare to those displayed on the Stability AI website?

    -The quality of images generated by Stable Diffusion 3 is quite good and does not appear to be significantly cherry-picked compared to the images on the Stability AI website.

  • What challenges do most AI generators face when generating images with text?

    -AI generators often struggle with text coherence, ensuring that the text in the generated image is legible and accurately reflects the input prompt.

  • What is the monthly cost for a Pro Plan on Pixel Doo?

    -The Pro Plan on Pixel Doo starts at $9.95 a month, which includes unlimited image generations.

  • What additional features are available to Pro Plan members on Pixel Doo?

    -Pro Plan members have access to a creative upscaler and all the other Stable Diffusion models that are integrated into Pixel Doo.

  • How does the user feel about the prompt adherence of Stable Diffusion 3 compared to previous versions?

    -The user feels that the prompt adherence for Stable Diffusion 3 is significantly better compared to previous versions, to the point where negative prompts might not be as necessary.

Outlines

00:00

🚀 Stable Diffusion 3 and Turbo Release via API

Stability AI has released Stable Diffusion 3 and its Turbo version, but with a catch—they are only available through an API. The company has partnered with Fireworks AI, an API platform that offers hosting and quick access to AI models. Despite the high API pricing of about $10 per thousand credits, with Stable Diffusion 3 costing 6 to 12 credits per image generated, the presenter managed to set up Stable Diffusion 3 beta on Pixel Dojo within 3 hours. This allows users to generate images by providing a prompt and optionally a negative prompt, choosing between the two versions of the model, and viewing examples. The presenter also discusses the cost of the Pro Plan for unlimited usage and shares initial generated images to demonstrate the model's adherence to prompts and quality, comparing them to those displayed on Stability AI's website.

05:02

🎨 Testing Image Generation with Stable Diffusion 3

The video script continues with a detailed examination of the image generation process using Stable Diffusion 3 and its Turbo model. The presenter tests various prompts to assess the model's adherence to the given instructions and its ability to handle text within images—a challenge for many AI generators. Examples include generating images of an anthropomorphic tortoise on a subway, a man with a retro TV for a head in a desert, and a cardboard box with text on it. The presenter notes that while the standard model generally performs well, the Turbo model, which is faster, produces lower quality and more cartoonish results. The script also includes tests with more complex prompts, such as a kangaroo wearing ski goggles and holding a beer, and an entire universe inside a bottle at Walmart. The presenter concludes that Stable Diffusion 3 mostly lives up to its hype, producing images that are similar to those on the Stability AI website without excessive cherry-picking. The summary ends with an invitation for viewers to try the model on Pixel Dojo with a Pro membership and to share their thoughts on the new models.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 is an advanced AI model developed by Stability AI for generating images from textual prompts. It represents a significant upgrade from previous versions, offering faster and more accurate image generation capabilities. In the video, it is used to create various images, demonstrating its ability to interpret complex prompts and generate detailed and coherent images.

💡API

An Application Programming Interface (API) is a set of rules and protocols that allows software applications to communicate and interact with each other. In the context of the video, Stable Diffusion 3 is made available via an API, which means users can access the model's image-generating capabilities by sending requests to the API endpoint.

💡Fireworks AI

Fireworks AI is mentioned as an API platform that partners with Stability AI to provide hosting and fast, stable access to AI models like Stable Diffusion 3. This partnership ensures that users can reliably use the image generation service without worrying about the technical infrastructure.

💡Model Weights

In machine learning, model weights are the parameters that the model learns during training to make accurate predictions or generate outputs. The video mentions that Stability AI plans to make the model weights of Stable Diffusion 3 available for self-hosting to members, which means users with the appropriate technical knowledge can run the model on their own servers.

💡Pixel Doo

Pixel Doo appears to be a platform or service where the video's creator has implemented the Stable Diffusion 3 beta for users to generate images. It serves as an interface for interacting with the Stable Diffusion 3 API, allowing users to input prompts and receive generated images.

💡Prompt

A prompt is a textual description or a statement that guides the AI in generating a specific type of image. The video discusses how users can input prompts into Pixel Doo to generate images with Stable Diffusion 3, emphasizing the importance of clear and descriptive prompts for achieving the desired image outcomes.

💡Negative Prompt

A negative prompt is an additional input that specifies what the AI should avoid including in the generated image. While the video does not use negative prompts in its examples, it suggests that they can be a useful tool for refining the image generation process by explicitly stating what elements are not desired in the output.

💡Credits

In the context of the video, credits refer to the units of currency used to pay for the API requests to generate images with Stable Diffusion 3. The cost is mentioned as approximately $10 per thousand credits, with each image generation costing a certain number of credits depending on the model used.

💡Pro Plan

The Pro Plan is a paid subscription option mentioned in the video that offers unlimited image generations with the Stable Diffusion models on Pixel Doo. It is positioned as a cost-effective way for users to access the image generation service without worrying about the number of credits consumed.

💡Text Coherence

Text coherence refers to the ability of the AI to understand and generate text within images in a way that makes sense and follows the context of the prompt. The video discusses the challenges AI generators have historically faced with text in images and tests Stable Diffusion 3's capabilities in this area, noting mixed results.

💡Cherry-Picking

Cherry-picking is the practice of selecting only the best or most favorable outcomes to present, often to make a product or service appear better than it is in reality. The video addresses concerns about cherry-picking by testing the AI with various prompts and comparing the generated images to those displayed on the website, aiming to determine if the showcased images are representative of the model's typical performance.

Highlights

Stability AI has released Stable Diffusion 3 and Stable Diffusion 3 Turbo, available only via API.

Stable Diffusion 3 has partnered with Fireworks AI for hosting and fast access.

Model weights for self-hosting will be available for Stability AI members in the near future.

Stable Diffusion 3 Beta was set up on Pixel Doo within 3 hours.

Users can generate images with a prompt, optionally a negative prompt, and choose between Stable Diffusion 3 and Turbo.

Pricing for the API is high, at about $10 per thousand credits.

Stable Diffusion 3 is 32 times more expensive to generate an image than Stable Diffusion XL 1.0.

A Pro Plan starts at $9.95 a month for unlimited usage of Pixel Dojo.

The quality of images generated by the model is comparable to those displayed on the website, suggesting no cherry-picking.

The model's prompt adherence is strong, with generated images closely following the input prompts.

Text coherence in generated images is generally good, although some issues were noted.

Stable Diffusion 3 Turbo model is quicker but produces lower quality images compared to the standard model.

The Turbo model struggled slightly with complex text in images but still provided reasonable results.

The standard model demonstrated better adherence to complex prompts with detailed elements.

Stable Diffusion 3 seems to live up to the hype, with most generated images being of high quality.

Negative prompts were not used in the tests, but could be an area for further exploration.

Pixel Doo offers a Pro membership for $9.95 a month, which includes unlimited generations and access to other Stable Diffusion models.

More features and models will be added to Pixel Doo as time progresses.