Seed Audio 1.0 is a text-to-audio and audio-to-audio model for turning a written script into rich, finished audio with speech, sound effects, and music.

A simple script produces natural, expressive speech, but it's the room for flexibility, precise control, and emotional delivery that makes this model unique.

This tutorial walks through generating with Seed Audio 1.0 and explores its different use cases.

Step 1 – Opening the Audio tool

Seed Audio 1.0 lives inside Custom mode. You'll start there before adding any inputs:

From the left sidebar, click Custom.
At the top of the input panel, select the Audio tab.
Confirm that Seed Audio 1.0 is selected in the model dropdown next to the Generate button.

Step 2 – Adding your prompt and inputs

You can generate from a text prompt alone, or pair text with reference media to guide the voice, tone, or style.

Text only

Type your prompt directly into the field labeled Describe your audio. This is the simplest path and works well when you don't need to match a specific voice or sound.

Text + audio references

Use audio references to influence vocal qualities like timbre, accent, or delivery.

Click one of the three Reference slots at the top of the panel, or click add audio references in the prompt field.
Upload up to 3 audio files.
Write your text prompt below.

The duration will automatically adjust based on the script length. Alternatively, you can include an ideal length in your prompt to guide the total duration.

Prompt examples

This section contains different use cases and their prompt examples.

Use case	Prompt example
Create audio with multiple speakers in a single clip	Person 1 has a deep orc-like speaking style. Person 2 is an elegant elven princess. Person 1: My queen, I am proud to fight alongside you. Person 2: Thank you, but I'm a princess. Person 1: Oh, my sincere apologies.
Quickly clone a character voice with an audio snippet	The voice from @Audio 1 speaks the following script: Listen, I always have a plan. It's whether the plan goes according to plan that's the problem.
Generate immersive audio scenes with speech, background effects, and music	A 30 second full cinematic scene of a hero knight giving an empowered speech on the battlefield to his fellow soldiers.
Explore script ideas with general topic directions	An American podcast host reviewing a book she recently read and enjoyed.
Precisely control delivery with performance tags	Person 1 has a deep orc-like speaking style. Person 2 is an elegant elven princess. Person 1: [excited] My queen, [heroic, speaks slowly] I am proud to fight alongside you. Person 2: Thank you, [giggles] [sarcastic] BUT I'm a princess. Person 1: Oh... [heavy exhale] my sincere apologies.

Step 3 – Configuring output settings

Open the settings panel using the slider icon at the bottom right of the input area to fine-tune format and delivery:

Format

Choose the file format for your output:

MP3 — compressed, smallest file size
WAV — uncompressed, highest fidelity
OGG — open-format compression

Sample rate

Defaults to 44.1 kHz. You can select from 8 kHz up to 48 kHz. Sample rate does not affect credit cost, but higher rates may slightly increase generation time.

Speech, Loudness, and Pitch

Three scales let you adjust the output:

Speech rate — how fast the audio is delivered
Loudness — overall volume
Pitch — higher or lower tonal range

Each defaults to 0. Drag the slider or type a value to adjust.

Step 4 – Generating and understanding cost

Once your prompt, references, and settings are ready, click Generate.

Audio generations use a two-part billing model:

5 credits are charged immediately to start the generation.
The remaining cost is charged once the generation completes, based on the final duration of the audio.

For example, a 2-minute generation costs 30 credits total:

5 credits on initial request
25 credits when the generation finishes

Why is the cost split this way?

Audio length depends on the prompt and isn't known until generation completes. The minimum charge reserves the request, and the final amount reflects the actual output duration.

Next Steps

You now have a generated audio file ready to download or pull into your next project. From here, you might want to:

Use the audio as a reference with Seedance 2.0 to use it when animating an Image

Specs

Supported inputs	Text + up to 3 audio references
Audio requirements	< 30s total
Output formats	MP3, WAV, OGG
Sample rates	8 kHz, 16 kHz, 24 kHz, 32 kHz, 44.1 kHz (default), 48 kHz
Maximum duration	2 minutes (depends on the prompt)
Credit cost	Minimum 5 credits to start; final cost charged when the generation completes
Availability	Custom (tool mode) only

FAQ

Why did my Seed Audio 1.0 generation fail due to moderation?

Seed Audio 1.0 moderation is handled by the provider. However, using a reference audio clip that sounds like a known voice may trigger their moderation, as well as sensitive subject matter requested in the script prompt.

Creating with Seed Audio 1.0