A production-ready REST API for Russian speech recognition using the T-one model
A complete, ready-to-use REST API that provides Russian speech recognition capabilities. Simply clone, install, and run!
- π― Offline Recognition - Transcribe complete audio files with timestamps
- π Streaming Recognition - Real-time speech recognition with low latency
- β‘ Parallel Processing - Multiple workers for concurrent request handling
- π Auto Documentation - Interactive Swagger UI and ReDoc
- π Easy Setup - Automated installation with Makefile
- π Production Ready - Clean codebase with comprehensive error handling
- π RESTful API - Standard HTTP endpoints for easy integration
- π΄ Redis Support - Optional Redis storage for multi-instance deployments
- π³ Docker Support - Ready-to-use Docker images and docker-compose setup
Choose your preferred deployment method:
The easiest way to get started is using Docker Compose. No need to install dependencies manually!
With Memory Storage:
# Clone repository
git clone https://github.com/masasibata/t-one-rest-api.git
cd t-one-rest-api
# Start API (builds automatically on first run)
docker compose up -d api
# View logs
docker compose logs -f apiWith Redis Storage (for production):
# Start API with Redis
docker compose up -d api-redis redis
# View logs
docker compose logs -f api-redisAccess the API:
- π API:
http://localhost:8000(memory) orhttp://localhost:8001(Redis) - π Swagger UI:
http://localhost:8000/docs - π ReDoc:
http://localhost:8000/redoc
π‘ Tip: Docker automatically handles all dependencies, model downloads, and configuration. Perfect for quick testing and production deployment!
β‘ Performance: For better throughput with multiple concurrent requests, see Performance Tuning section to configure multiple workers.
For development or if you prefer local installation:
Prerequisites:
| Requirement | Version | Installation |
|---|---|---|
| Python | 3.9+ | python.org |
| Poetry | 2.1+ | Auto-installed by Makefile |
| Git | Latest | git-scm.com |
| cmake | 3.10+ | See below |
| Build Tools | - | See below |
Install cmake and build tools:
# Ubuntu/Debian
sudo apt-get install cmake build-essential
# macOS
brew install cmake
xcode-select --install
# Windows
# Download from https://cmake.org/download/Installation Steps:
- Clone the repository:
git clone https://github.com/masasibata/t-one-rest-api.git
cd t-one-rest-api- Install dependencies:
make installThis command will:
- β Check for required system dependencies (cmake)
- β Clone the T-one repository automatically
- β Install Poetry (if not already installed)
- β Install all Python dependencies including T-one
β±οΈ Note: Installation may take several minutes when building the
kenlmpackage. Be patient!
- (Optional) Install with Redis support:
For production deployments with multiple API instances:
make install-redisπ‘ Note: For single-instance deployments, the default memory storage is sufficient.
- Start the server:
make run- Access the API:
- π API:
http://localhost:8000 - π Swagger UI:
http://localhost:8000/docs - π ReDoc:
http://localhost:8000/redoc
β‘ Performance: For production use with multiple concurrent requests, see Performance Tuning section to configure multiple workers.
Once the server is running, you can explore the API using:
- Swagger UI (
/docs) - Interactive API explorer with "Try it out" feature (protected withX-API-Keyheader ifAPI_KEYis set) - ReDoc (
/redoc) - Beautiful, responsive API documentation (protected withX-API-Keyheader ifAPI_KEYis set)
The API supports optional API key authentication. If API_KEY environment variable is set, all endpoints except /health and / require the key to be provided in the X-API-Key header.
Configuration:
# Set API key via environment variable
export API_KEY=your-secret-key-here
# Or in .env file
echo "API_KEY=your-secret-key-here" >> .envUsage:
When API key is configured, include it in the X-API-Key header:
# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe" -F "file=@audio.wav"
# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe" \
-H "X-API-Key: your-secret-key-here" \
-F "file=@audio.wav"Protected Endpoints:
POST /transcribe- Requires API key if configuredPOST /transcribe/streaming- Requires API key if configuredPOST /transcribe/streaming/chunk- Requires API key if configuredPOST /transcribe/streaming/finalize- Requires API key if configured
Public Endpoints (always accessible):
GET /- API informationGET /health- Health check
Protected Documentation (if API_KEY is set):
GET /docs- Swagger UI (requiresX-API-Keyheader)GET /redoc- ReDoc documentation (requiresX-API-Keyheader)
π‘ Note: If
API_KEYis not set, the API works without authentication. This is useful for development or internal networks.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
API information and available endpoints |
GET |
/health |
Health check and model status |
POST |
/transcribe |
Transcribe complete audio file (offline) |
POST |
/transcribe/streaming |
Start streaming recognition session |
POST |
/transcribe/streaming/chunk |
Process audio chunk in streaming mode |
POST |
/transcribe/streaming/finalize |
Finalize streaming session |
Get information about the API and available endpoints.
Response:
{
"name": "T-one ASR API",
"version": "1.0.0",
"description": "REST API for Russian speech recognition",
"endpoints": {
"POST /transcribe": "Transcribe speech from audio file (offline)",
"POST /transcribe/streaming": "Start streaming recognition",
"POST /transcribe/streaming/chunk": "Send audio chunk for streaming",
"POST /transcribe/streaming/finalize": "Finalize streaming"
}
}Check API status and verify model is loaded.
Response:
{
"status": "healthy",
"model_loaded": true
}Transcribe speech from a complete audio file.
Parameters:
file(file, required) - Audio file (WAV, FLAC, MP3, OGG, etc.)return_timestamps(bool, optional) - Return timestamps with phrases (default:true)
Example Request:
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@audio.wav" \
-F "return_timestamps=true"Example Response:
{
"phrases": [
{
"text": "ΠΏΡΠΈΠ²Π΅Ρ",
"start_time": 1.79,
"end_time": 2.04
},
{
"text": "ΡΡΠΎ ΡΠ΅ΡΡ",
"start_time": 3.72,
"end_time": 4.26
}
],
"full_text": "ΠΏΡΠΈΠ²Π΅Ρ ΡΡΠΎ ΡΠ΅ΡΡ",
"duration": 4.26,
"processing_time": 0.85
}Create a new streaming recognition session.
Headers:
X-API-Key(string, optional) - API key for authentication (required ifAPI_KEYenv var is set)
Example Request:
# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming"
# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming" \
-H "X-API-Key: your-secret-key"Response:
{
"phrases": [],
"state_id": "550e8400-e29b-41d4-a716-446655440000",
"is_final": false
}Process an audio chunk in streaming mode.
Parameters:
state_id(string, required) - State ID from/transcribe/streamingfile(file, required) - Audio chunk
Headers:
X-API-Key(string, optional) - API key for authentication (required ifAPI_KEYenv var is set)
Example Request:
# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming/chunk" \
-F "state_id=550e8400-e29b-41d4-a716-446655440000" \
-F "file=@chunk.wav"
# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming/chunk" \
-H "X-API-Key: your-secret-key" \
-F "state_id=550e8400-e29b-41d4-a716-446655440000" \
-F "file=@chunk.wav"Finalize streaming session and get final results.
Parameters:
state_id(string, required) - State ID from streaming session
Headers:
X-API-Key(string, optional) - API key for authentication (required ifAPI_KEYenv var is set)
Example Request:
# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming/finalize" \
-F "state_id=550e8400-e29b-41d4-a716-446655440000"
# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming/finalize" \
-H "X-API-Key: your-secret-key" \
-F "state_id=550e8400-e29b-41d4-a716-446655440000"import requests
# Transcribe audio file
with open("audio.wav", "rb") as f:
response = requests.post(
"http://localhost:8000/transcribe",
files={"file": f},
data={"return_timestamps": True}
)
result = response.json()
print(f"Full text: {result['full_text']}")
print(f"Processing time: {result['processing_time']:.2f} sec")
for phrase in result["phrases"]:
print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")import requests
import os
# Optional: Get API key from environment
api_key = os.getenv("API_KEY")
headers = {"X-API-Key": api_key} if api_key else {}
# 1. Start streaming session
response = requests.post(
"http://localhost:8000/transcribe/streaming",
headers=headers
)
state_id = response.json()["state_id"]
# 2. Process audio chunks
chunk_files = ["chunk1.wav", "chunk2.wav", "chunk3.wav"]
for chunk_file in chunk_files:
with open(chunk_file, "rb") as f:
response = requests.post(
"http://localhost:8000/transcribe/streaming/chunk",
files={"file": f},
data={"state_id": state_id},
headers=headers
)
result = response.json()
# Print recognized phrases
for phrase in result["phrases"]:
print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")
# 3. Finalize session
response = requests.post(
"http://localhost:8000/transcribe/streaming/finalize",
data={"state_id": state_id},
headers=headers
)
final_result = response.json()
print("\nFinal phrases:")
for phrase in final_result["phrases"]:
print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")π See
asr_api/example_client.pyfor a complete working example.
| Command | Description |
|---|---|
make install |
Clone T-one and install dependencies (memory) |
make install-redis |
Install with Redis support for distributed storage |
make run |
Start the ASR API server (development, single worker) |
make run-prod |
Start the ASR API server (production, multiple workers) |
make docker-build |
Build Docker images |
make docker-up |
Start services with docker-compose |
make docker-down |
Stop docker-compose services |
make docker-logs |
View docker-compose logs |
make clean |
Remove T-one clone and cache files |
make help |
Show available commands |
The docker-compose.yml includes three services:
| Service | Description | Port | Storage |
|---|---|---|---|
api |
API with memory storage | 8000 | In-memory |
api-redis |
API with Redis storage | 8001 | Redis |
redis |
Redis server | 6379 | - |
# Start API with memory storage
docker compose up -d api
# Start API with Redis
docker compose up -d api-redis redis
# View logs
docker compose logs -f api
# Stop all services
docker compose down
# Rebuild images
docker compose buildIf you prefer to build and run manually:
Build images:
# Memory storage version
docker build -t t-one-rest-api .
# Redis version
docker build -f Dockerfile.redis -t t-one-rest-api:redis .Run containers:
# Memory storage
docker run -d \
--name t-one-api \
-p 8000:8000 \
-e LOG_LEVEL=INFO \
t-one-rest-api
# Redis storage (requires Redis running)
docker run -d \
--name t-one-api-redis \
-p 8000:8000 \
-e STORAGE_TYPE=redis \
-e REDIS_URL=redis://host.docker.internal:6379/0 \
t-one-rest-api:redisEnvironment Variables:
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server host |
PORT |
8000 |
Server port |
WORKERS |
1 |
Number of worker processes for parallel processing |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
STORAGE_TYPE |
memory |
Storage type: memory or redis |
REDIS_URL |
redis://localhost:6379/0 |
Redis connection URL |
REDIS_KEY_PREFIX |
asr:session: |
Redis key prefix |
SESSION_TIMEOUT_SECONDS |
3600 |
Session timeout in seconds |
MAX_FILE_SIZE_MB |
100 |
Maximum file size in MB |
API_KEY |
None (optional) |
Optional API key for authentication |
Volumes:
model-cache- Caches downloaded models from HuggingFace (persists between restarts)model-storage- Persistent storage for downloaded models (avoids re-downloading on rebuild)redis-data- Persistent Redis data storage
Health Checks:
All services include automatic health checks:
- API:
GET /healthendpoint - Redis:
redis-cli pingcommand
For production deployments with multiple API instances or when you need persistent session storage, you can use Redis instead of in-memory storage.
# Install with Redis support
make install-redisOr manually:
poetry install --extras redisSet environment variables to use Redis:
export STORAGE_TYPE=redis
export REDIS_URL=redis://localhost:6379/0
export REDIS_KEY_PREFIX=asr:session:Or create a .env file:
STORAGE_TYPE=redis
REDIS_URL=redis://localhost:6379/0
REDIS_KEY_PREFIX=asr:session:Docker (recommended):
docker run -d -p 6379:6379 redis:7-alpineLocal installation:
# Ubuntu/Debian
sudo apt-get install redis-server
sudo systemctl start redis
# macOS
brew install redis
brew services start redis-
β Multi-instance deployments - Share sessions across multiple API servers
-
β High availability - Sessions survive server restarts
-
β Horizontal scaling - Load balance requests across instances
-
β Production environments - Better reliability and monitoring
-
β Single instance - Memory storage is simpler and faster
-
β Development/testing - No need for additional infrastructure
-
β Low traffic - Memory storage is sufficient
| Feature | Memory Storage | Redis Storage |
|---|---|---|
| Setup | No additional setup | Requires Redis server |
| Performance | Fastest (in-process) | Fast (network overhead) |
| Persistence | Lost on restart | Survives restarts |
| Multi-instance | β Not supported | β Supported |
| Scalability | Single server only | Horizontal scaling |
| Production Ready | Limited | β Recommended |
The API supports parallel processing of multiple audio files simultaneously through multiple worker processes. This is especially useful when you need to process many files concurrently.
- Automatic ONNX Runtime Configuration: The API automatically patches ONNX Runtime to support parallel workers without modifying the T-one package. Each worker can process requests independently.
- Worker Processes: Each worker runs in a separate process with its own model instance, allowing true parallel processing.
- Load Distribution: Gunicorn automatically distributes incoming requests across available workers.
Docker (Recommended):
-
Using .env file (easiest):
# Create or edit .env file echo "WORKERS=4" >> .env # Start API docker compose up -d api
-
Edit docker-compose.yml:
services: api: environment: WORKERS: "4" # Adjust based on your CPU cores
Then run:
docker compose up -d api -
Environment variable (docker-compose v2.20+):
WORKERS=4 docker compose up -d api
Local Installation:
# Production mode with 4 workers
make run-prod WORKERS=4
# Or directly with gunicorn:
poetry run gunicorn asr_api.main:app \
-w 4 \
-k uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 300Development Mode:
# Single worker with auto-reload (for development)
make run| CPU Cores | Recommended Workers | Notes |
|---|---|---|
| 2-4 | WORKERS=2-4 |
Use all cores |
| 4-8 | WORKERS=4-7 |
Leave 1 core for system |
| 8-16 | WORKERS=8-15 |
Leave 1-2 cores for system |
| 16+ | WORKERS=15-31 |
Leave 1-2 cores for system |
Resource Considerations:
- Memory: Each worker loads the model (~500MB-1GB RAM per worker)
- CPU: More workers = better throughput, but diminishing returns after ~CPU cores
- I/O: Multiple workers help when processing many files simultaneously
- Start with CPU cores count: Set
WORKERSequal to your CPU core count - Monitor resource usage: Watch CPU and memory usage, adjust if needed
- Use Redis for multi-instance: When scaling across multiple servers, use Redis storage
- Test with your workload: Different audio file sizes may benefit from different worker counts
After starting with multiple workers, check logs to verify:
# Docker
docker compose logs api | grep "Booting worker"
# Should show multiple workers:
# [INFO] Booting worker with pid: 12
# [INFO] Booting worker with pid: 13
# [INFO] Booting worker with pid: 14
# [INFO] Booting worker with pid: 15You should also see:
INFO: ONNX Runtime patch applied successfully for parallel workers
This confirms that parallel processing is enabled.
With 4 workers, you can process 4 audio files simultaneously:
import asyncio
import aiohttp
async def transcribe_file(session, url, file_path):
with open(file_path, 'rb') as f:
data = aiohttp.FormData()
data.add_field('file', f, filename='audio.wav')
async with session.post(url, data=data) as resp:
return await resp.json()
async def main():
url = "http://localhost:8000/transcribe"
files = ["file1.wav", "file2.wav", "file3.wav", "file4.wav"]
async with aiohttp.ClientSession() as session:
# Process all files in parallel
results = await asyncio.gather(*[
transcribe_file(session, url, f) for f in files
])
for result in results:
print(result['full_text'])
asyncio.run(main())With 4 workers, all 4 files will be processed simultaneously, significantly reducing total processing time.
If you see TimeoutError or Read timed out during installation:
# Simply retry the installation
make installIf kenlm fails to install repeatedly:
# Install kenlm separately
poetry run pip install kenlm
poetry installIf you get compilation errors:
# Ubuntu/Debian
sudo apt-get install build-essential cmake
# macOS
xcode-select --install
brew install cmakeIf the model fails to load on startup:
- Check your internet connection
- Verify DNS resolution:
ping huggingface.co - Check firewall settings
- Review server logs for detailed error messages
| Property | Value |
|---|---|
| Model | T-one (71M parameters) |
| Architecture | Conformer (CTC-based) |
| Language | Russian |
| Domain | Telephony (call center optimized) |
| Quality (WER) | 8.63% on call-center data |
| Model Size | ~71MB (auto-downloaded) |
| Latency | Low (optimized for real-time) |
The API supports all audio formats that can be read by librosa/soundfile:
- β WAV
- β FLAC
- β MP3
- β OGG
- β And more...
Audio is automatically converted to mono 8kHz format for processing.
t-one-rest-api/
βββ asr_api/ # Main API package
β βββ __init__.py
β βββ main.py # FastAPI application
β βββ config.py # Configuration management
β βββ models.py # Pydantic data models
β βββ audio_processor.py # Audio processing & T-one integration
β βββ onnx_patch.py # ONNX Runtime patch for parallel workers
β βββ routers/ # API route handlers
β βββ services/ # Business logic services
β βββ storage/ # Storage backends (memory/redis)
β βββ utils/ # Utility functions
βββ T-one/ # T-one repository (auto-cloned)
βββ pyproject.toml # Poetry configuration
βββ Makefile # Build automation
βββ Dockerfile # Docker image for memory storage
βββ Dockerfile.redis # Docker image for Redis storage
βββ docker-compose.yml # Docker Compose configuration
βββ README.md # This file
The project uses Poetry for dependency management:
# Add a new dependency
poetry add <package-name>
# Add a development dependency
poetry add --group dev <package-name>The server runs with auto-reload enabled by default:
make run
# or
poetry run uvicorn asr_api.main:app --host 0.0.0.0 --port 8000 --reload-
State Storage - Use Redis storage for multi-instance deployments:
# Install with Redis make install-redis # Configure export STORAGE_TYPE=redis export REDIS_URL=redis://your-redis-host:6379/0
-
CORS - Configure specific domains instead of
"*"inCORSMiddleware -
Authentication - Add JWT tokens or API keys to protect endpoints
-
Rate Limiting - Implement request rate limiting per client
-
Logging - Set up centralized logging (ELK, Loki, etc.)
-
Monitoring - Add metrics (Prometheus/Grafana)
-
Reverse Proxy - Use nginx for load balancing and SSL
-
File Size Limits - Configure maximum file size restrictions (default: 100MB)
This project is licensed under the MIT License.
The T-one model is licensed under the Apache 2.0 License.
- π― T-one Model on HuggingFace
- π¦ T-one GitHub Repository
- π FastAPI Documentation
- π Poetry Documentation
Contributions are welcome! Please feel free to submit a Pull Request.
Made with β€οΈ for Russian speech recognition