Без опису

Lukas Goldschmidt 9a4c7191ca dockerized 4 днів тому
.gitignore e8f88eb57c Initial commit 6 днів тому
Dockerfile 9a4c7191ca dockerized 4 днів тому
README.md 9a4c7191ca dockerized 4 днів тому
requirements.txt 9a4c7191ca dockerized 4 днів тому
server.py 9a4c7191ca dockerized 4 днів тому

README.md

Whisper Transcription HTTP Server

A minimal HTTP API for speech-to-text transcription built with:

  • Flask
  • whisper.cpp
  • ffmpeg

The service accepts audio uploads and returns a single transcription result using the Whisper model.

It is designed to be:

  • simple
  • reliable
  • easy to containerize
  • suitable for internal automation pipelines

Typical use cases include:

  • voice message transcription
  • voice assistant pipelines
  • transcription preprocessing
  • automation workflows (Node-RED, etc.)

Architecture

The server works in three stages:

  1. Upload audio

A client sends a multipart/form-data POST request containing an audio file.

  1. Convert audio

The server converts the audio into 16 kHz mono WAV using ffmpeg. This ensures compatibility and stable input for Whisper.

  1. Transcribe

The server calls the whisper.cpp CLI (whisper-cli) with a specified model.

  1. Return text

The transcription result is returned as JSON.

If the transcription result is empty, the server returns diagnostic information to help debugging.


API

Health Check

GET /health

Returns server status and verifies:

  • whisper binary exists
  • model file exists
  • ffmpeg is available

Example response:

{
  "ok": true,
  "problems": []
}

Transcribe Audio

POST /transcribe

Request

multipart/form-data

Field name:

file

Example:

file=@audio.wav

Supported formats (handled by ffmpeg):

  • wav
  • mp3
  • ogg
  • m4a
  • most other common audio formats

Response

Success:

{
  "text": "Hello this is a transcription."
}

If the transcription is empty, the server returns diagnostics:

{
  "text": "",
  "note": "empty transcript; returning diagnostics",
  "stdout": "...",
  "stderr": "...",
  "cmd": [...]
}

Project Structure

.
├── server.py
├── requirements.txt
├── Dockerfile
└── README.md

At runtime the container will also contain:

/app/whisper.cpp
/app/whisper.cpp/build/bin/whisper-cli
/app/whisper.cpp/models/ggml-small.bin

Dependencies

Runtime components:

  • Python 3.11
  • Flask
  • Gunicorn
  • ffmpeg
  • whisper.cpp
  • Whisper model (ggml-small.bin)

The Docker image builds whisper.cpp automatically.


Configuration

The server reads these environment variables:

WHISPER_BIN
MODEL_PATH

Defaults inside the container:

WHISPER_BIN=/app/whisper.cpp/build/bin/whisper-cli
MODEL_PATH=/app/whisper.cpp/models/ggml-small.bin

Build Docker Image

From the project directory:

docker build -t whisper-api .

Run Container

docker run -d -it -p 5005:5005 --name whisper-server whisper-api

The API will be available at:

http://localhost:5005

Test the Server

Health Check

curl http://localhost:5005/health

Transcribe Audio

curl -X POST \
  -F "file=@test.wav" \
  http://localhost:5005/transcribe

Example response:

{"text":"hello this is a test"}

Development (Without Docker)

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run server:

python server.py

Model Choice

The default Docker build downloads:

ggml-small.bin

You can switch models by modifying the Dockerfile:

Model Speed Accuracy
tiny very fast low
base fast moderate
small balanced good
medium slow high
large very slow best

Performance Notes

  • whisper.cpp runs fully on CPU
  • transcription speed depends on:

    • CPU cores
    • CPU vector extensions (AVX/AVX2)
    • model size

Typical small-model performance on modern CPUs:

~0.5x – 2x realtime

Security Notes

This server:

  • accepts arbitrary audio uploads
  • runs ffmpeg on them

For production deployments consider:

  • reverse proxy (nginx / traefik)
  • request size limits
  • authentication
  • rate limiting

License

This project uses:

  • whisper.cpp — MIT License
  • Flask — BSD License

See their respective repositories for details.