Convert text and documents into natural-sounding speech

Paste your text or upload a document — TXT, PDF, DOCX, PPTX, or VTT — or even an image with OCR. Choose an AI voice, set the emotion, speed, pitch, and volume, then generate and download a natural voiceover. Built for long-form scripts and documents.

Open workspace

30AI voices

80+languages

200Kcharacters per run

Text to Speech

Convert text and documents into natural AI voiceovers.

Open full workspace

2 credits per minute.

What this tool is built for

Convert text and documents into natural AI voiceovers.

Highlights

Built for Text to Speech

Voice and emotion control

Choose an AI voice, then apply emotion presets — happy, sad, calm, neutral, and more — and adjust speed, pitch, and volume to match the tone.

Multiple input sources

Paste text, upload TXT, PDF, DOCX, PPTX, or VTT documents, or drop in an image and pull the text out with OCR.

Built for long documents

Process articles, scripts, PDFs, and slide decks in one run — ideal for narration, study material, and business voiceovers.

How it works

How Text to Speech works

Add your content

Paste text or upload a document, VTT file, or image.

Choose voice settings

Pick a voice and fine-tune emotion, speed, pitch, and volume.

Generate and download

Generate the voiceover, preview the result, then download the audio.

Capabilities

What it handles well right now

Convert pasted text or documents to speech with natural AI voices

Import from TXT, MD, PDF, DOC, DOCX, PPTX, and VTT, or extract text from images with OCR

Set emotion presets and fine-tune speed, pitch, and volume

Handle long-form content up to 200,000 characters and download as MP3

Common jobs

What people use Text to Speech for

Video and podcast voiceover narration

E-learning and course audio content

Document and article narration

Accessibility audio versions of written content

FAQ

What people usually ask before they run it

What input types are supported?

Direct text input, document uploads (TXT, MD, PDF, DOC, DOCX, PPTX, VTT), and image uploads with OCR, so you can convert text from many sources into speech.

Can I control speed, pitch, and volume?

Yes. You can fine-tune speed, pitch, and volume, plus apply emotion presets such as happy, sad, calm, and neutral to match your use case.

Can I convert long documents to speech?

Yes. It handles long-form content such as articles, scripts, PDFs, DOCX files, and slide decks, up to 200,000 characters per run.

Should I use text to speech or subtitle to speech?

Use text to speech for plain text, long documents, and general voiceovers. Use subtitle to speech when you need VTT subtitle timing and cue-by-cue control.

How much does it cost?

2 credits per minute of generated audio.