How to Use Generative Audio | Runway Academy

Runway
8 May 202403:07

TLDRThis Runway Academy tutorial explores generative audio, demonstrating how to convert text to speech, customize voice models, and create lip-sync videos. Users can type and generate spoken audio, preview voices, and train custom voice models with clean audio. The process also covers adding lip-sync to images or videos, with tips for seamless integration and avoiding camera motion for a natural effect. The tutorial encourages joining the community for further resources and support.

Takeaways

  • 🎙️ Generative audio includes text-to-speech, custom voice models, and creating lip-sync videos.
  • 🔧 Access the generative audio tool from the Runway dashboard to create spoken audio files from text.
  • 🗣️ Preview and select a voice from the default list or train a custom voice model using clean audio recordings.
  • ⏱️ Audio generation time varies based on script length but is generally quick.
  • 💾 Generated audio is automatically saved in the 'generative audio' folder within the main assets.
  • 🎧 Train a custom voice model with a few minutes of clean audio and use it for text-to-speech.
  • 👤 For lip-sync videos, upload an image or video with a clear view of the person's face.
  • 🎥 Use lip-sync with generated, recorded, or uploaded audio to synchronize with the visuals.
  • 🔁 If the audio is longer than the video, the video will loop to match the audio duration.
  • 📹 Pro tip: When creating videos, use subject motion with a motion brush to minimize the reversing effect.
  • 💡 For more resources and community support, join Runway's Discord or use the dashboard help button.

Q & A

  • What is generative audio?

    -Generative audio refers to the process of creating audio content using artificial intelligence, which includes text-to-speech, custom voice models, and creating lip-sync videos.

  • How do you access the generative audio tool in Runway?

    -You can access the generative audio tool by logging into your Runway dashboard and clicking on the generative audio tool at the top.

  • What can you do with the generative audio tool after typing in text?

    -After typing in text, you can preview it, choose a voice from the default voice list, and then click on the generate button to turn it into a spoken audio file.

  • How long does it typically take to generate audio using the tool?

    -The generation time depends on the length of the script, but it usually goes pretty quickly.

  • Where are the generated audio files saved by default in Runway?

    -The generated audio files are automatically saved to the generative audio folder inside of your main assets folder in Runway.

  • What is required to train a custom voice model in Runway?

    -To train a custom voice model, you need a few minutes of clean audio, which can be imported or recorded directly within the generative audio tool.

  • How do you ensure the audio is clean for training a custom voice model?

    -The audio should be as clear as possible, with minimal background noise and consistent volume levels.

  • What is the process for creating a lip-sync video in Runway?

    -To create a lip-sync video, you need an image or video of a person with a full face visible. You then upload this media and synchronize it with generated or uploaded audio.

  • Can you use lip-sync with different types of audio in Runway?

    -Yes, lip-sync can be used with generated audio from text-to-speech, recorded audio, or uploaded audio.

  • What happens if the audio is longer than the video when creating a lip-sync video?

    -If the audio is longer than the video, the video will reverse and go back to the beginning for the duration of the audio once it reaches the end of its duration.

  • What is a pro tip for using the video workflow in Runway's generative audio tool?

    -A pro tip is to avoid using camera motion parameters and instead add subject motion with a motion brush to make the reversing effect less noticeable.

  • Where can users find more resources and community support for using Runway?

    -Users can join the Runway community on Discord for more resources and experimentation, or use the dashboard button for specific answers to their questions.

Outlines

00:00

🎙️ Introduction to Generative Audio

The video script introduces viewers to Runway Academy's generative audio tool, which encompasses text-to-speech, custom voice models, and lip sync video creation. The tutorial begins with accessing the tool from the Runway dashboard and demonstrates how to convert typed text into spoken audio. It guides users through previewing and selecting a voice, with James as a default option, and generating the audio. The script also explains how to save the audio files and briefly touches on training a custom voice model using clean audio recordings. The process involves importing audio, ensuring clarity, and naming the model for use with text-to-speech.

Mindmap

Keywords

Generative Audio

Generative audio refers to the process of creating new audio content using artificial intelligence. In the context of the video, it includes text-to-speech conversion, custom voice models, and lip-sync videos. It's a technology that allows users to generate human-like speech from text or to create audio that matches specific characteristics of a voice, enhancing the realism of digital content.

Text to Speech

Text to speech (TTS) is a technology that converts written text into spoken words. It's a key component of generative audio, as it allows users to type in any text and have it converted into a spoken audio file. In the video, this is demonstrated by typing a script into the Runway dashboard and selecting a voice to generate the audio.

Custom Voice Models

Custom voice models are unique audio profiles created by training an AI with a specific set of voice recordings. These models can be used to generate speech that mimics the sound of the recorded voice. The video script mentions training a custom voice model using clean audio, which can then be used with text-to-speech to create personalized audio content.

Lip Sync Videos

Lip sync videos are a form of media where the audio is synchronized with the movements of the lips in a video or image. The video script explains how to create such videos using Runway's generative audio tool, which involves adding audio to an image or video of a person, ensuring that the lip movements match the spoken words.

Runway Dashboard

The Runway dashboard is the user interface for the Runway platform, where users can access and manage various tools and features. In the video, it's mentioned as the starting point for using the generative audio tool, where users can type in text, select voices, and generate audio files.

Voice List

The voice list in the context of the video refers to the set of pre-recorded voices available for users to choose from when generating audio. These voices are part of the text-to-speech functionality, allowing users to select the desired voice for their audio content.

Audio Generation

Audio generation in the video refers to the process of creating audio files from text or other audio sources using the generative audio tool. It includes the steps of writing text, selecting a voice, and using the tool to generate the corresponding audio file, which can then be used in various media projects.

Assets Folder

The assets folder in Runway is where all generated and uploaded media files are stored. The video mentions that audio generations are automatically saved to the generative audio folder within the main assets folder, allowing users to easily manage and access their created audio files.

Lip Sync

Lip sync, as discussed in the video, is the process of matching the movements of a person's lips in a video or image to the corresponding audio. It's an important aspect of creating realistic and engaging video content, and the video provides a step-by-step guide on how to achieve this using Runway's tools.

Gen 2

Gen 2, as mentioned in the video, refers to a feature that allows users to convert a static image into a video. This is particularly useful for creating lip-sync videos, as it enables the transformation of an image into a video format that can then have generative audio added to it.

Motion Brush

The motion brush is a tool within Runway's video workflow that allows users to add subject motion to their videos. The video suggests using the motion brush instead of camera motion parameters when creating lip-sync videos to avoid a noticeable reversing effect when the audio is longer than the video.

Highlights

Introduction to generative audio tools in Runway Academy.

Generative audio includes text to speech, custom voice models, and creating lip sync videos.

Access the generative audio tool from the Runway dashboard.

Type in text to convert it into a spoken audio file.

Preview and choose a voice from the default voice list.

Generation times vary based on script length but are usually quick.

Audio generations are automatically saved in the generative audio folder.

Option to save audio in a different location via a drop-down menu.

Train a custom voice model with a few minutes of clean audio.

Record audio directly in Runway for custom voice models.

Ensure the audio is clean for optimal custom voice model training.

Use the trained custom voice model with text to speech.

Create a lip sync video using an image or video of a person.

Upload your own media or use preset characters for lip sync.

Lip sync can be applied to generated, recorded, or uploaded audio.

Use Gen 2 to turn an image into a video for lip sync.

If audio is longer than video, it will loop from the beginning.

Tip: Avoid camera motion parameters for smoother video reversing.

Join the Runway community on Discord for more resources and experimentation.

Find specific answers to questions using the dashboard button.