|
|
пре 3 дана | |
|---|---|---|
| cache | пре 6 дана | |
| models | пре 6 дана | |
| voices | пре 6 дана | |
| .gitignore | пре 6 дана | |
| Dockerfile | пре 6 дана | |
| README.md | пре 6 дана | |
| build.sh | пре 6 дана | |
| run.sh | пре 6 дана | |
| tts_server.py | пре 3 дана | |
| tts_server_nochunks.py | пре 6 дана | |
| tts_server_noncaching.py | пре 6 дана | |
| tts_server_simple.py | пре 6 дана | |
| tts_server_unstable.py | пре 3 дана |
A Dockerized Coqui TTS server with multilingual XTTS, multi-speaker support, voice caching, and automatic audio conversion.
.mp3 → .wav conversionspeed, pitch, and language/tts and /api/tts endpointscoqui-docker/
├── Dockerfile
├── build.sh
├── run.sh
├── README.md
├── tts_server.py # Production server with caching and fallback
├── tts_server_simple.py # Simple version
├── tts_server_noncaching.py # Legacy stages
├── models/ # XTTS models (host mount recommended)
├── voices/ # User voices (.wav or .mp3)
└── cache/ # Persistent embeddings
./build.sh
./run.sh
The scripts handle:
/models, /voices, /cache/tts or /api/ttsSynthesize speech and return the full audio file.
Method: GET
| Parameter | Default | Description |
|---|---|---|
text |
required | Text to synthesize |
voice |
default |
Voice name in /voices |
lang |
en |
Language code |
Returns: audio/wav
curl "http://localhost:5002/tts?text=Hello%20world&voice=trump" --output hello.wav
/tts_stream or /api/tts_streamStreams generated audio while synthesis is happening.
This is useful for:
Method: GET
Same as /tts.
curl "http://localhost:5002/tts_stream?text=Hello%20world&voice=trump" --output hello.wav
Streaming works best with audio players capable of handling progressive WAV streams.
/voicesReturns available voices.
Method: GET
{
"voices": ["trump","narrator","alice"]
}
.wav is the canonical internal format.mp3 is converted automatically when needed.mp3 is newer than the .wav, reconversion is triggered/cache for faster synthesisLong inputs are automatically split into smaller chunks before synthesis.
This provides several advantages:
Chunked outputs are automatically concatenated into a single audio stream.
The server is designed to work even on small GPUs (~4GB VRAM) such as:
If the GPU runs out of memory:
This allows stable operation even with long text inputs.
/models, /voices, and /cache should be mounted as Docker volumes/tts endpoint is backward-compatible with /api/ttsDEFAULT_VOICE = "default" in tts_server.py for missing voice parameterslicensing@coqui.ai