Seed Audio 1.0 is a text-to-audio and audio-to-audio model for turning a written script into rich, finished audio with speech, sound effects, and music.
A simple script produces natural, expressive speech, but it's the room for flexibility, precise control, and emotional delivery that makes this model unique.
This tutorial walks through generating with Seed Audio 1.0 and explores its different use cases.
Step 1 – Opening the Audio tool
Seed Audio 1.0 lives inside Custom mode. You'll start there before adding any inputs:
- From the left sidebar, click Custom.
- At the top of the input panel, select the Audio tab.
- Confirm that Seed Audio 1.0 is selected in the model dropdown next to the Generate button.
Step 2 – Adding your prompt and inputs
You can generate from a text prompt alone, or pair text with reference media to guide the voice, tone, or style.
Text only
Type your prompt directly into the field labeled Describe your audio. This is the simplest path and works well when you don't need to match a specific voice or sound.
Text + audio references
Use audio references to influence vocal qualities like timbre, accent, or delivery.
- Click one of the three Reference slots at the top of the panel, or click add audio references in the prompt field.
- Upload up to 3 audio files.
- Write your text prompt below.
The duration will automatically adjust based on the script length. Alternatively, you can include an ideal length in your prompt to guide the total duration.
Prompt examples
This section contains different use cases and their prompt examples.
| Use case | Prompt example |
|---|---|
| Create audio with multiple speakers in a single clip |
Person 1 has a deep orc-like speaking style. Person 2 is an elegant elven princess. Person 1: My queen, I am proud to fight alongside you. |
| Quickly clone a character voice with an audio snippet |
The voice from @Audio 1 speaks the following script: Listen, I always have a plan. It's whether the plan goes according to plan that's the problem. |
| Generate immersive audio scenes with speech, background effects, and music | A 30 second full cinematic scene of a hero knight giving an empowered speech on the battlefield to his fellow soldiers. |
| Explore script ideas with general topic directions | An American podcast host reviewing a book she recently read and enjoyed. |
| Precisely control delivery with performance tags |
Person 1 has a deep orc-like speaking style. Person 2 is an elegant elven princess. Person 1: [excited] My queen, [heroic, speaks slowly] I am proud to fight alongside you. |
Step 3 – Configuring output settings
Open the settings panel using the slider icon at the bottom right of the input area to fine-tune format and delivery:
Format
Choose the file format for your output:
- MP3 — compressed, smallest file size
- WAV — uncompressed, highest fidelity
- OGG — open-format compression
Sample rate
Defaults to 44.1 kHz. You can select from 8 kHz up to 48 kHz. Sample rate does not affect credit cost, but higher rates may slightly increase generation time.
Speech, Loudness, and Pitch
Three scales let you adjust the output:
- Speech rate — how fast the audio is delivered
- Loudness — overall volume
- Pitch — higher or lower tonal range
Each defaults to 0. Drag the slider or type a value to adjust.
Step 4 – Generating and understanding cost
Once your prompt, references, and settings are ready, click Generate.
Audio generations use a two-part billing model:
- 5 credits are charged immediately to start the generation.
- The remaining cost is charged once the generation completes, based on the final duration of the audio.
For example, a 2-minute generation costs 30 credits total:
- 5 credits on initial request
- 25 credits when the generation finishes
Why is the cost split this way?
Audio length depends on the prompt and isn't known until generation completes. The minimum charge reserves the request, and the final amount reflects the actual output duration.
Next Steps
You now have a generated audio file ready to download or pull into your next project. From here, you might want to:
- Use the audio as a reference with Seedance 2.0 to use it when animating an Image
Specs
| Supported inputs | Text + up to 3 audio references |
| Audio requirements | < 30s total |
| Output formats | MP3, WAV, OGG |
| Sample rates | 8 kHz, 16 kHz, 24 kHz, 32 kHz, 44.1 kHz (default), 48 kHz |
| Maximum duration | 2 minutes (depends on the prompt) |
| Credit cost | Minimum 5 credits to start; final cost charged when the generation completes |
| Availability | Custom (tool mode) only |
FAQ
Why did my Seed Audio 1.0 generation fail due to moderation?
Seed Audio 1.0 moderation is handled by the provider. However, using a reference audio clip that sounds like a known voice may trigger their moderation, as well as sensitive subject matter requested in the script prompt.