Stable Diffusion 3 vs ChatGPT Dalle-3 vs Midjourney [NEW Best Image Generator?]

AI Andy
3 Mar 202420:50

TLDRThis video compares three AI image generators—Stable Diffusion 3, Midjourney, and Dalle-3—using the same prompts to evaluate their performance based on detail, adherence to the prompt, and 'coolness' factor. The comparison covers a range of scenarios, from a cinematic photo of a red apple in a classroom to a complex scene of a horse balancing on a colorful ball. While Stable Diffusion 3 excels in text and positional accuracy, Dalle-3 stands out for its style and creativity. Midjourney, although visually appealing, struggles with text adherence. The video concludes that Dalle-3 and Stable Diffusion 3 are favored for their strengths, with the anticipation of future improvements and community contributions to enhance these models.


  • 🎨 **Stable Diffusion 3** excels at text adherence and placing objects accurately but may lack in the 'coolness' factor compared to others.
  • 🚀 **Midjourney** stands out for its high 'coolness' and unique style, although it sometimes struggles with text clarity and detail adherence.
  • 🌟 **Dalle-3 (Chat GPT)** impresses with its dramatic lighting and detailed clarity, often achieving a balance between detail and style.
  • 🍎 In the first prompt comparing a cinematic photo of a red apple, **Stable Diffusion 3** was criticized for lacking 'coolness', while **Midjourney** and **Dalle-3** offered more visually appealing results.
  • 🌌 For the painting of an astronaut riding a pig, **Stable Diffusion** showed perfect adherence to the prompt, but **Midjourney** and **Dalle-3** provided more stylized and 'cool' outputs.
  • 🦎 In the close-up studio photograph of a chameleon, **Midjourney** was particularly praised for its high-quality and detailed rendering of the animal.
  • 🖥 The prompt for a 90's desktop computer showed **Stable Diffusion 3** doing well with nostalgia and detail, while **Midjourney** leaned into a gritty, steampunk style.
  • 🏎️ A sports car on a racetrack was depicted with motion and speed by all generators, but **Dalle-3** provided a notably 'cool' and retro take on the scene.
  • 🧊 When tasked with transparent glass bottles filled with colored liquids, **Midjourney** struggled with accuracy, while **Dalle-3** managed a more realistic and stylized depiction.
  • 🐯 An embroidered cloth with text and a tiger showed **Stable Diffusion 3**'s strength in texture and detail, whereas **Dalle-3** added a personal touch with additional elements like pottery.
  • 🌈 The final prompt involving a horse on a colorful ball was best realized by **Dalle-3**, offering a more believable and stylized outcome compared to the other generators.

Q & A

  • What are the three factors the video ranks the image generators on?

    -The video ranks the image generators on detail, adherence to the prompt, and coolness.

  • What criticism is mentioned about Stable Diffusion V3?

    -The criticism mentioned about Stable Diffusion V3 is that it is lacking on the coolness factor.

  • How does Midjourney's image of a red apple compare to Stable Diffusion V3's in terms of detail and clarity?

    -Midjourney's image of a red apple lacks a little bit in detail and clarity compared to Stable Diffusion V3's.

  • What is the main advantage of Dolly 3 in the comparison?

    -Dolly 3 has very good clarity and detail, and it is noted for its high coolness factor with dramatic lighting.

  • Which image generator is said to have the best adherence to the prompt for the painting of an astronaut riding a pig?

    -Stable Diffusion is said to have executed the prompt perfectly with the best adherence.

  • What is the issue with Midjourney's generated image of the astronaut and pig?

    -Midjourney's generated image has a good coolness factor and adherence, but the quality and clarity, particularly the leg of the pig, is a bit off.

  • How does Dolly 3 perform with the prompt of the chameleon over a black background?

    -Dolly 3 performs well, offering a very stylized and dramatic photo with high detail and a coolness factor.

  • What is the main criticism of Midjourney when it comes to text generation?

    -Midjourney is criticized for not performing well with text generation, often not adhering closely to the text elements of the prompt.

  • Which image generator is favored for its ability to do text really well?

    -Stable Diffusion is favored for its ability to do text really well, placing things accurately according to the prompt.

  • What is the final verdict on which image generator the video creator would use?

    -The video creator's favorite and the one they would use is Chat GPT, followed by Dolly 3, due to their style and text generation capabilities.

  • What does the video suggest about the future of Stable Diffusion once it becomes open source?

    -The video suggests that once Stable Diffusion becomes open source, different models may emerge from the community that could potentially outperform the current offerings.

  • How does the video creator describe the style of the images generated by Chat GPT?

    -The video creator describes the style of the images generated by Chat GPT as more appealing and cooler compared to a more sterile, scientific lab look.



🎨 Comparative Analysis of AI Art Generation Models

The video script begins with a comparison of three AI art generation models: Stable Diffusion 3, Mid Journey, and Dolly 3. The comparison is based on three criteria: detail, adherence to the prompt, and coolness factor. The first prompt involves creating a cinematic photo of a red apple in a classroom with specific text on the blackboard. The script discusses the strengths and weaknesses of each model in terms of detail clarity, realism, and style, with a particular emphasis on the 'coolness' aspect that many people appreciate.


🚀 Creative Prompts and Model Performance

The script continues with a series of creative prompts to test the AI models' capabilities. These include an astronaut riding a pig, a chameleon on a black background, a desktop computer with specific text on the screen, and more. Each prompt is analyzed for adherence to the details provided, the quality of the generated images, and the 'coolness' of the output. The discussion highlights the different styles and approaches of the models, with a focus on their ability to handle text and complex scenes.


🏎️ Evaluating Adherence and Style in Dynamic Scenes

The video script describes the results of prompts involving dynamic scenes, such as a sports car on a racetrack and a horse balancing on a ball. The models are evaluated on their ability to capture motion, adhere to the prompt, and maintain a high level of detail and style. The script provides a critique of each model's output, noting where they excel and where they fall short, particularly in terms of text generation and realism.


🌌 Diverse Styles and Creative Interpretations

The script discusses the diverse styles and creative interpretations of the AI models when faced with complex and fantastical prompts. It covers the models' performances on generating images of a horse in an unrealistic pose, an anime-style illustration, and other stylized scenes. The emphasis is on the unique artistic styles produced by each model and how they handle the challenge of creating text and specific details within their outputs.


📈 Final Thoughts and Recommendations

In the concluding part of the script, the narrator shares their final thoughts and recommendations. They express a preference for the style and text generation capabilities of Chachi BT and Dolly 3 models over Stable Diffusion, while acknowledging the strengths of each model. The script ends with a call to action, inviting viewers to find their preferred model and prompting them to continue watching for more content.



💡Stable Diffusion 3

Stable Diffusion 3 is an image-generating model that is being compared in the video for its ability to create images from textual prompts. It is noted for its detail and adherence to the prompt but is criticized for sometimes lacking in the 'coolness' factor. For instance, when generating an image of an astronaut riding a pig, Stable Diffusion 3 perfectly adheres to the complex prompt, showcasing its capability for detailed and accurate image generation.


Midjourney is another AI image generator that the video compares to Stable Diffusion 3 and Dalle-3. It is appreciated for its higher 'coolness' factor and stylistic outputs, although it sometimes falls short in text clarity and realness. An example from the script is the生成 (generation) of a chameleon, where Midjourney's output is praised for its coolness and quality, despite not having text elements.


Dalle-3, or Dolly 3, is the third version of an image-generating AI that is also being evaluated in the video. It is recognized for creating images with a dramatic and stylized look, and it often excels in the 'coolness' aspect. However, it may not always adhere as closely to the textual prompts as Stable Diffusion 3. An example is the生成 (generation) of a sports car with the text 'sd3' on it, where Dalle-3 provides a retro and cool perspective, although it doesn't perfectly match the prompt.


Adherence refers to how closely the generated images follow the details provided in the textual prompts. It is one of the three factors by which the AI models are judged in the video. Adherence is important because it measures the models' ability to understand and visualize complex concepts accurately. For example, when creating an image of a red apple with specific text on a blackboard, adherence would be assessed based on whether the apple, text, and setting are accurately depicted.

💡Coolness Factor

The 'coolness factor' is a subjective measure of how visually appealing and stylistically interesting the generated images are. It's one of the criteria used to rank the AI models in the video. A high coolness factor suggests that an image is not only technically accurate but also engaging and artistic. The video mentions that some viewers prefer Midjourney for its higher coolness factor, even when it might lack in other areas like text clarity.


Detail is a critical aspect of image quality, referring to the clarity and intricacy of the elements within the generated images. The level of detail is one of the factors used to evaluate the performance of the AI models. High levels of detail contribute to the realism and quality of the images, such as when depicting individual scales on a chameleon or the texture of an embroidered cloth.

💡Text Clarity

Text clarity is the ability of the AI models to accurately render and incorporate textual elements within the generated images. It is an important aspect when the prompt includes specific text that needs to be visible and legible in the image. For example, in the prompt for a sports car with 'sd3' on the side, text clarity would be assessed based on the visibility and correctness of the text in the generated image.

💡Realness Factor

The realness factor describes how lifelike and authentic the generated images appear. It is related to the level of detail and the accuracy of the models' rendering of real-world objects and scenarios. An image with a high realness factor would look convincingly like a photograph or a realistic painting. The video discusses this in the context of evaluating how well the models represent physical properties, such as the lighting and shadows on objects.


A prompt is the textual description or request given to the AI models to generate an image. It includes the subjects, actions, and any specific details that the image should contain. The effectiveness of the AI models is judged on their ability to interpret and respond to these prompts accurately. For example, a prompt might describe a scene with a specific object, background, and text, and the AI's interpretation of this prompt would determine the resulting image.

💡Image Generation

Image generation is the process by which AI models create visual content based on textual prompts. It involves understanding the text, visualizing the described scene or object, and rendering it as an image. The video compares different models based on their image generation capabilities, focusing on how well they can create images that match the prompts in terms of detail, adherence, and coolness.

💡AI Model

An AI model, in the context of this video, refers to a specific instance or version of an artificial intelligence system designed for image generation. The video compares different AI models—Stable Diffusion 3, Midjourney, and Dalle-3—to evaluate their performance in generating images from textual descriptions. Each AI model has its unique strengths and weaknesses in terms of detail, adherence to prompts, and stylistic output.


Comparison of three image generators: Stable Diffusion 3, Midjourney, and Dalle-3.

Ranking based on detail, adherence to the prompt, and coolness factor.

Stable Diffusion 3 criticized for lacking on the coolness factor.

Midjourney has higher coolness factor but lower detail clarity.

Dalle-3 has good clarity, detail, and dramatic lighting, making it visually appealing.

Stable Diffusion excels in text adherence and style.

Midjourney's style is street art-oriented with high coolness factor but less text accuracy.

Dalle-3 sometimes generates multiple images, offering varied interpretations.

Studio photograph of a chameleon showcases detailed and high-quality imagery from all generators.

Midjourney particularly excels in creating animal imagery.

Dalle-3 provides stylized and dramatic photos, highly rated for coolness.

Stable Diffusion 3 effectively creates nostalgic and detailed scenes.

Midjourney's interpretation of prompts sometimes leans towards a gritty, steampunk aesthetic.

Dalle-3's retro UI and attention to detail offer a unique and appealing style.

Challenges in generating transparent liquids and correct color representation across generators.

Stable Diffusion 3's embroidery detail and lighting effects are praised for their beauty.

Midjourney struggles with text generation and adherence to specific prompt elements.

Dalle-3's inclusion of additional elements like pottery adds a unique touch to the imagery.

All generators perform well with abstract and fantastical prompts, such as a horse on a ball.

Dalle-3 stands out for its stylized and dramatic representation of abstract concepts.

The video concludes with a preference for Dalle-3's style and potential, despite Stable Diffusion's strengths in text.