Без опису

Lukas Goldschmidt 9a4c7191ca dockerized		4 місяців тому
.gitignore	e8f88eb57c Initial commit	4 місяців тому
Dockerfile	9a4c7191ca dockerized	4 місяців тому
README.md	9a4c7191ca dockerized	4 місяців тому
requirements.txt	9a4c7191ca dockerized	4 місяців тому
server.py	9a4c7191ca dockerized	4 місяців тому

Whisper Transcription HTTP Server

A minimal HTTP API for speech-to-text transcription built with:

Flask
whisper.cpp
ffmpeg

The service accepts audio uploads and returns a single transcription result using the Whisper model.

It is designed to be:

simple
reliable
easy to containerize
suitable for internal automation pipelines

Typical use cases include:

voice message transcription
voice assistant pipelines
transcription preprocessing
automation workflows (Node-RED, etc.)

Architecture

The server works in three stages:

Upload audio

A client sends a multipart/form-data POST request containing an audio file.

Convert audio

The server converts the audio into 16 kHz mono WAV using ffmpeg. This ensures compatibility and stable input for Whisper.

Transcribe

The server calls the whisper.cpp CLI (whisper-cli) with a specified model.

Return text

The transcription result is returned as JSON.

If the transcription result is empty, the server returns diagnostic information to help debugging.

API

Health Check

GET /health

Returns server status and verifies:

whisper binary exists
model file exists
ffmpeg is available

Example response:

{
  "ok": true,
  "problems": []
}

Transcribe Audio

POST /transcribe

Request

multipart/form-data

Field name:

file

Example:

file=@audio.wav

Supported formats (handled by ffmpeg):

wav
mp3
ogg
m4a
most other common audio formats

Response

Success:

{
  "text": "Hello this is a transcription."
}

If the transcription is empty, the server returns diagnostics:

{
  "text": "",
  "note": "empty transcript; returning diagnostics",
  "stdout": "...",
  "stderr": "...",
  "cmd": [...]
}

Project Structure

.
├── server.py
├── requirements.txt
├── Dockerfile
└── README.md

At runtime the container will also contain:

/app/whisper.cpp
/app/whisper.cpp/build/bin/whisper-cli
/app/whisper.cpp/models/ggml-small.bin

Dependencies

Runtime components:

Python 3.11
Flask
Gunicorn
ffmpeg
whisper.cpp
Whisper model (ggml-small.bin)

The Docker image builds whisper.cpp automatically.

Configuration

The server reads these environment variables:

WHISPER_BIN
MODEL_PATH

Defaults inside the container:

WHISPER_BIN=/app/whisper.cpp/build/bin/whisper-cli
MODEL_PATH=/app/whisper.cpp/models/ggml-small.bin

Build Docker Image

From the project directory:

docker build -t whisper-api .

Run Container

docker run -d -it -p 5005:5005 --name whisper-server whisper-api

The API will be available at:

http://localhost:5005

Test the Server

Health Check

curl http://localhost:5005/health

Transcribe Audio

curl -X POST \
  -F "file=@test.wav" \
  http://localhost:5005/transcribe

Example response:

{"text":"hello this is a test"}

Development (Without Docker)

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run server:

python server.py

Model Choice

The default Docker build downloads:

ggml-small.bin

You can switch models by modifying the Dockerfile:

Model	Speed	Accuracy
tiny	very fast	low
base	fast	moderate
small	balanced	good
medium	slow	high
large	very slow	best

Performance Notes

whisper.cpp runs fully on CPU
transcription speed depends on:
- CPU cores
- CPU vector extensions (AVX/AVX2)
- model size

Typical small-model performance on modern CPUs:

~0.5x – 2x realtime

Security Notes

This server:

accepts arbitrary audio uploads
runs ffmpeg on them

For production deployments consider:

reverse proxy (nginx / traefik)
request size limits
authentication
rate limiting

License

This project uses:

whisper.cpp — MIT License
Flask — BSD License

See their respective repositories for details.