|
|
@@ -1,3 +1,319 @@
|
|
|
-# Whisper STT Server
|
|
|
+# Whisper Transcription HTTP Server
|
|
|
|
|
|
-this first flask version works for me.
|
|
|
+A minimal HTTP API for **speech-to-text transcription** built with:
|
|
|
+
|
|
|
+* **Flask**
|
|
|
+* **whisper.cpp**
|
|
|
+* **ffmpeg**
|
|
|
+
|
|
|
+The service accepts audio uploads and returns a **single transcription result** using the Whisper model.
|
|
|
+
|
|
|
+It is designed to be:
|
|
|
+
|
|
|
+* simple
|
|
|
+* reliable
|
|
|
+* easy to containerize
|
|
|
+* suitable for internal automation pipelines
|
|
|
+
|
|
|
+Typical use cases include:
|
|
|
+
|
|
|
+* voice message transcription
|
|
|
+* voice assistant pipelines
|
|
|
+* transcription preprocessing
|
|
|
+* automation workflows (Node-RED, etc.)
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Architecture
|
|
|
+
|
|
|
+The server works in three stages:
|
|
|
+
|
|
|
+1. **Upload audio**
|
|
|
+
|
|
|
+ A client sends a `multipart/form-data` POST request containing an audio file.
|
|
|
+
|
|
|
+2. **Convert audio**
|
|
|
+
|
|
|
+ The server converts the audio into **16 kHz mono WAV** using `ffmpeg`.
|
|
|
+ This ensures compatibility and stable input for Whisper.
|
|
|
+
|
|
|
+3. **Transcribe**
|
|
|
+
|
|
|
+ The server calls the **whisper.cpp CLI** (`whisper-cli`) with a specified model.
|
|
|
+
|
|
|
+4. **Return text**
|
|
|
+
|
|
|
+ The transcription result is returned as JSON.
|
|
|
+
|
|
|
+If the transcription result is empty, the server returns **diagnostic information** to help debugging.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# API
|
|
|
+
|
|
|
+## Health Check
|
|
|
+
|
|
|
+```
|
|
|
+GET /health
|
|
|
+```
|
|
|
+
|
|
|
+Returns server status and verifies:
|
|
|
+
|
|
|
+* whisper binary exists
|
|
|
+* model file exists
|
|
|
+* ffmpeg is available
|
|
|
+
|
|
|
+Example response:
|
|
|
+
|
|
|
+```json
|
|
|
+{
|
|
|
+ "ok": true,
|
|
|
+ "problems": []
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Transcribe Audio
|
|
|
+
|
|
|
+```
|
|
|
+POST /transcribe
|
|
|
+```
|
|
|
+
|
|
|
+### Request
|
|
|
+
|
|
|
+`multipart/form-data`
|
|
|
+
|
|
|
+Field name:
|
|
|
+
|
|
|
+```
|
|
|
+file
|
|
|
+```
|
|
|
+
|
|
|
+Example:
|
|
|
+
|
|
|
+```
|
|
|
+file=@audio.wav
|
|
|
+```
|
|
|
+
|
|
|
+Supported formats (handled by ffmpeg):
|
|
|
+
|
|
|
+* wav
|
|
|
+* mp3
|
|
|
+* ogg
|
|
|
+* m4a
|
|
|
+* most other common audio formats
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Response
|
|
|
+
|
|
|
+Success:
|
|
|
+
|
|
|
+```json
|
|
|
+{
|
|
|
+ "text": "Hello this is a transcription."
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+If the transcription is empty, the server returns diagnostics:
|
|
|
+
|
|
|
+```json
|
|
|
+{
|
|
|
+ "text": "",
|
|
|
+ "note": "empty transcript; returning diagnostics",
|
|
|
+ "stdout": "...",
|
|
|
+ "stderr": "...",
|
|
|
+ "cmd": [...]
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Project Structure
|
|
|
+
|
|
|
+```
|
|
|
+.
|
|
|
+├── server.py
|
|
|
+├── requirements.txt
|
|
|
+├── Dockerfile
|
|
|
+└── README.md
|
|
|
+```
|
|
|
+
|
|
|
+At runtime the container will also contain:
|
|
|
+
|
|
|
+```
|
|
|
+/app/whisper.cpp
|
|
|
+/app/whisper.cpp/build/bin/whisper-cli
|
|
|
+/app/whisper.cpp/models/ggml-small.bin
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Dependencies
|
|
|
+
|
|
|
+Runtime components:
|
|
|
+
|
|
|
+* Python 3.11
|
|
|
+* Flask
|
|
|
+* Gunicorn
|
|
|
+* ffmpeg
|
|
|
+* whisper.cpp
|
|
|
+* Whisper model (`ggml-small.bin`)
|
|
|
+
|
|
|
+The Docker image builds whisper.cpp automatically.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Configuration
|
|
|
+
|
|
|
+The server reads these environment variables:
|
|
|
+
|
|
|
+```
|
|
|
+WHISPER_BIN
|
|
|
+MODEL_PATH
|
|
|
+```
|
|
|
+
|
|
|
+Defaults inside the container:
|
|
|
+
|
|
|
+```
|
|
|
+WHISPER_BIN=/app/whisper.cpp/build/bin/whisper-cli
|
|
|
+MODEL_PATH=/app/whisper.cpp/models/ggml-small.bin
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Build Docker Image
|
|
|
+
|
|
|
+From the project directory:
|
|
|
+
|
|
|
+```bash
|
|
|
+docker build -t whisper-api .
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Run Container
|
|
|
+
|
|
|
+```
|
|
|
+docker run -d -it -p 5005:5005 --name whisper-server whisper-api
|
|
|
+```
|
|
|
+
|
|
|
+The API will be available at:
|
|
|
+
|
|
|
+```
|
|
|
+http://localhost:5005
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Test the Server
|
|
|
+
|
|
|
+### Health Check
|
|
|
+
|
|
|
+```
|
|
|
+curl http://localhost:5005/health
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Transcribe Audio
|
|
|
+
|
|
|
+```
|
|
|
+curl -X POST \
|
|
|
+ -F "file=@test.wav" \
|
|
|
+ http://localhost:5005/transcribe
|
|
|
+```
|
|
|
+
|
|
|
+Example response:
|
|
|
+
|
|
|
+```
|
|
|
+{"text":"hello this is a test"}
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Development (Without Docker)
|
|
|
+
|
|
|
+Create a Python virtual environment:
|
|
|
+
|
|
|
+```
|
|
|
+python -m venv venv
|
|
|
+source venv/bin/activate
|
|
|
+```
|
|
|
+
|
|
|
+Install dependencies:
|
|
|
+
|
|
|
+```
|
|
|
+pip install -r requirements.txt
|
|
|
+```
|
|
|
+
|
|
|
+Run server:
|
|
|
+
|
|
|
+```
|
|
|
+python server.py
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Model Choice
|
|
|
+
|
|
|
+The default Docker build downloads:
|
|
|
+
|
|
|
+```
|
|
|
+ggml-small.bin
|
|
|
+```
|
|
|
+
|
|
|
+You can switch models by modifying the Dockerfile:
|
|
|
+
|
|
|
+| Model | Speed | Accuracy |
|
|
|
+| ------ | --------- | -------- |
|
|
|
+| tiny | very fast | low |
|
|
|
+| base | fast | moderate |
|
|
|
+| small | balanced | good |
|
|
|
+| medium | slow | high |
|
|
|
+| large | very slow | best |
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Performance Notes
|
|
|
+
|
|
|
+* whisper.cpp runs **fully on CPU**
|
|
|
+* transcription speed depends on:
|
|
|
+
|
|
|
+ * CPU cores
|
|
|
+ * CPU vector extensions (AVX/AVX2)
|
|
|
+ * model size
|
|
|
+
|
|
|
+Typical small-model performance on modern CPUs:
|
|
|
+
|
|
|
+```
|
|
|
+~0.5x – 2x realtime
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# Security Notes
|
|
|
+
|
|
|
+This server:
|
|
|
+
|
|
|
+* accepts arbitrary audio uploads
|
|
|
+* runs ffmpeg on them
|
|
|
+
|
|
|
+For production deployments consider:
|
|
|
+
|
|
|
+* reverse proxy (nginx / traefik)
|
|
|
+* request size limits
|
|
|
+* authentication
|
|
|
+* rate limiting
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+# License
|
|
|
+
|
|
|
+This project uses:
|
|
|
+
|
|
|
+* **whisper.cpp** — MIT License
|
|
|
+* **Flask** — BSD License
|
|
|
+
|
|
|
+See their respective repositories for details.
|