How to use MidJourney 5 Describe Function with Digital Art

ControlAltAI
6 Apr 202324:55

TLDRThe video demonstrates the new 'image to text' feature of MidJourney 5, showcasing its ability to generate diverse prompts from various digital art images. The creator tests the AI with different styles, including photography, vector, and abstract art, revealing the AI's strengths and limitations in interpreting and generating prompts. The summary highlights the AI's impressive performance with certain images and its challenges with others, inviting viewers to explore the full video for a deeper understanding of the AI's capabilities.

Takeaways

  • πŸ˜€ MidJourney has released a new 'Image to Text' feature that generates prompts from images.
  • πŸ”„ Users can regenerate the same image to receive different prompts each time.
  • 🎨 The demonstration includes a variety of images like photography, minimal vector, and abstract art.
  • πŸ•’ Some images took between two hours to four days to create by hand in Photoshop and Illustrator.
  • πŸ“ˆ The video showcases how well MidJourney performs with different types of images.
  • πŸ‘ The AI has generated some very good prompts, with some being favorites of the creator.
  • 🚫 The AI sometimes fails to recognize elements in the images, like a shark's fin, leading to unusable prompts.
  • 🎭 The AI seems to excel with abstract images, providing creative and accurate prompts.
  • πŸ–ΌοΈ For some images, all generated prompts are usable, while for others, none are satisfactory.
  • 🌌 The AI struggles with astrophotography, not accurately reflecting the original image's elements.
  • 🎨 The AI is particularly good at picking up on colors in the images it processes.
  • πŸ“ The creator emphasizes the potential for AI to improve further with feedback from such tests.

Q & A

  • What is the new feature released by MidJourney called?

    -The new feature released by MidJourney is called 'image to text'.

  • How does the 'image to text' feature work?

    -The 'image to text' feature allows users to take any image and using the forward slash description command, the MidJourney bot will generate four prompts for each image. The feature can regenerate the same image to give different prompts every time.

  • What is the purpose of the video shown in the script?

    -The purpose of the video is to demonstrate the 'image to text' feature on MidJourney's private Discord Channel, showcasing how the bot generates prompts from various images.

  • How many images does the video use to test the 'image to text' feature?

    -The video uses about 15 images to test the 'image to text' feature.

  • What types of images were chosen to test the feature?

    -The images chosen for testing include photography, minimal Vector, abstract, and others to see how well the feature performs.

  • How long did it take to create some of the images used in the video?

    -Some of the images took anywhere between two hours to about four days to create.

  • What is the AI's performance like in generating prompts from images?

    -The AI's performance varies; it creates some very good prompts, but there are also instances where it does not generate usable prompts, indicating both the capabilities and limitations of the AI.

  • What does the AI struggle with in terms of image recognition?

    -The AI struggles with recognizing certain elements in the images, such as a shark's fin on the water or abstract elements that do not directly correlate with the original image.

  • What is the AI's accuracy in picking up colors from the images?

    -The AI picks up colors more accurately than other elements in the images, as noted in the discussion about the hearts and the vector lighthouse prompts.

  • What does the video suggest about the potential for improvement in AI-generated art?

    -The video suggests that there is potential for improvement in AI-generated art, especially if the AI could better understand and recognize the elements in the images it is generating prompts from.

  • What is the final recommendation for users interested in the 'image to text' feature?

    -The final recommendation is for users to try the 'image to text' feature themselves, experiment with different images, and see the AI's capabilities and limitations in generating prompts.

Outlines

00:00

πŸ–ΌοΈ Mid-Journey's Image to Text Feature Introduction

The script introduces a new feature by Mid-Journey called 'image to text', which allows users to input images and receive four different prompts generated by the Mid-Journey bot. The video is unedited and showcases the feature on a private Discord Channel. The creator uses 15 images, none of which are AI-generated, and were created manually in Photoshop and Illustrator. The time taken to create these images varied from two hours to four days. The video demonstrates the bot's performance in generating prompts from a diverse selection of images, including photography, minimal, vector, and abstract styles.

05:00

🎨 Testing AI's Prompt Generation with Diverse Images

The script describes the testing of the image to text feature with a variety of images, including a vector image with a heart pattern and a simplistic shark-themed vector. The creator expresses mixed reactions to the AI's prompts, finding some impressive and others not meeting expectations. The AI struggles with recognizing certain elements in the images, such as the shark's fin, and the creator emphasizes the need for prompt editing to achieve desired results. The section also includes the creator's reflections on the AI's capabilities and limitations.

10:01

🌌 AI's Abstract and Vector Art Prompts Evaluation

This section of the script focuses on the AI's performance with abstract and vector images, including a complex abstract piece that took days to create and a city vector drawn on an iPad. The AI's prompts are generally well-received, with the creator expressing amazement at the AI's ability to generate accurate and creative prompts. The script also mentions the potential for further customization through prompt editing to refine the AI's output.

15:06

πŸ“Έ AI's Challenges with Photography and Minimalism

The script discusses the AI's attempts to generate prompts from a photograph and a minimalistic abstract image. The creator notes that the AI sometimes misinterprets elements in the images, such as identifying a jet in an abstract image or an apple from an unrelated image. Despite these inaccuracies, the creator appreciates the AI's creativity and selects prompts that, while not directly related to the original images, are appealing due to their color similarities or imaginative concepts.

20:07

🌠 AI's Performance with Astrophotography and Night Scenes

The final part of the script evaluates the AI's ability to generate prompts from astrophotography and a long exposure ocean pier shot. The creator is disappointed with the AI's prompts for the astrophotography image due to inaccuracies and complexity in the required settings. However, the AI performs well with the ocean pier image, generating beautiful prompts despite some color inaccuracies. The script concludes with a call to like, subscribe, and enable notifications for new video uploads.

Mindmap

Keywords

MidJourney

MidJourney refers to a specific AI technology company that specializes in creating and implementing AI models for various purposes, including image and text generation. In the context of the video, MidJourney has released a new feature called 'image to text,' which allows users to input an image and receive text prompts generated by AI. The video demonstrates how this feature performs with different types of images, showcasing its capabilities and limitations.

Image to Text

The term 'Image to Text' describes the function of converting visual content into textual descriptions. In the video, the MidJourney bot uses this function to analyze images and create four distinct text prompts for each one. This feature is central to the video's demonstration, as it tests the AI's ability to interpret and describe various images accurately.

Prompts

In the context of AI and content generation, 'prompts' are the textual outputs or suggestions generated by the AI based on the input data, which in this case are images. The video script discusses how the MidJourney bot creates prompts for each image, and the creator evaluates the quality and relevance of these prompts.

Discord Channel

A 'Discord Channel' is a specific communication space within the Discord platform, where users can interact via text, voice, and video. In the video, the creator mentions using a private Discord Channel to showcase the MidJourney feature, indicating a controlled environment for testing and sharing the AI's outputs.

Photoshop

Photoshop is a widely used software application for image editing and digital art creation. The script mentions that some of the images tested with the AI were created using Photoshop, emphasizing the manual effort and skill involved in producing the original artworks before they were processed by the AI.

Illustrator

Illustrator, like Photoshop, is a software application, but it is more focused on creating vector graphics. The video script refers to images created 'from scratch' in Illustrator, highlighting the artistic process and the software's role in digital art creation.

Vector Image

A 'Vector Image' is a type of digital image that uses mathematical algorithms to define its shapes and colors. These images are scalable and can be resized without losing quality. The video discusses vector images among the various types tested with the AI, examining how well the AI interprets and generates prompts for this specific format.

Cyberpunk Style

Cyberpunk is a genre of science fiction that features advanced technological and scientific achievements, juxtaposed with a degree of breakdown or radical change in the social order. In the context of the video, 'Cyberpunk Style' artwork refers to a visual aesthetic often characterized by neon lights, dark cityscapes, and futuristic elements. The script mentions the AI's ability to generate prompts for a cyberpunk style image.

Astrophotography

Astrophotography is the art of photographing the night sky, capturing celestial objects like stars, planets, and galaxies. The video script includes a discussion about an astrophotography image taken at a beach, emphasizing the challenges and conditions required for such photography, and the AI's performance in generating prompts for this type of image.

Long Exposure

Long Exposure is a photography technique where the camera's shutter is open for a longer period, allowing more light to reach the sensor and creating effects such as light trails or smooth water surfaces. The video mentions a long exposure image of an ocean pier, discussing the AI's ability to interpret and generate prompts for this type of photography.

Highlights

MidJourney has released a new feature called 'image to text'.

Using the forward slash description, the MidJourney bot generates four prompts for each image.

The same image can be regenerated to provide different prompts each time.

The video showcases the feature on a private Discord Channel.

15 images, drawn from scratch in Photoshop and Illustrator, are used in the demonstration.

The creation time for images ranges from two hours to four days.

MidJourney's performance in generating prompts from image to text is tested with a variety of images.

Some prompts are very good, while others are not to the creator's liking.

The AI's ability to generate prompts is tested on a vector image.

The AI impresses with its ability to create prompts for an abstract image.

The AI struggles with recognizing specific elements in images, like a shark's fin.

The AI's performance on a photograph taken with a Fuji xt-1 is discussed.

The AI's ability to pick up colors accurately is noted.

The AI's creative generation of prompts for abstract images is praised.

Astrophotography prompts generated by the AI are critiqued for accuracy.

The AI's generation of prompts for a long exposure ocean Pier shot is evaluated.

The video concludes with a call to like, subscribe, and enable notifications for new content.