Skip to content

masasibata/t-one-rest-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 T-one REST API

A production-ready REST API for Russian speech recognition using the T-one model

Python FastAPI Poetry

A complete, ready-to-use REST API that provides Russian speech recognition capabilities. Simply clone, install, and run!


✨ Features

  • 🎯 Offline Recognition - Transcribe complete audio files with timestamps
  • πŸ”„ Streaming Recognition - Real-time speech recognition with low latency
  • ⚑ Parallel Processing - Multiple workers for concurrent request handling
  • πŸ“š Auto Documentation - Interactive Swagger UI and ReDoc
  • πŸš€ Easy Setup - Automated installation with Makefile
  • 🏭 Production Ready - Clean codebase with comprehensive error handling
  • 🌐 RESTful API - Standard HTTP endpoints for easy integration
  • πŸ”΄ Redis Support - Optional Redis storage for multi-instance deployments
  • 🐳 Docker Support - Ready-to-use Docker images and docker-compose setup

πŸš€ Quick Start

Choose your preferred deployment method:

🐳 Option 1: Docker (Recommended - Fastest)

The easiest way to get started is using Docker Compose. No need to install dependencies manually!

With Memory Storage:

# Clone repository
git clone https://github.com/masasibata/t-one-rest-api.git
cd t-one-rest-api

# Start API (builds automatically on first run)
docker compose up -d api

# View logs
docker compose logs -f api

With Redis Storage (for production):

# Start API with Redis
docker compose up -d api-redis redis

# View logs
docker compose logs -f api-redis

Access the API:

  • 🌐 API: http://localhost:8000 (memory) or http://localhost:8001 (Redis)
  • πŸ“– Swagger UI: http://localhost:8000/docs
  • πŸ“˜ ReDoc: http://localhost:8000/redoc

πŸ’‘ Tip: Docker automatically handles all dependencies, model downloads, and configuration. Perfect for quick testing and production deployment!

⚑ Performance: For better throughput with multiple concurrent requests, see Performance Tuning section to configure multiple workers.

πŸ”§ Option 2: Local Installation

For development or if you prefer local installation:

Prerequisites:

Requirement Version Installation
Python 3.9+ python.org
Poetry 2.1+ Auto-installed by Makefile
Git Latest git-scm.com
cmake 3.10+ See below
Build Tools - See below

Install cmake and build tools:

# Ubuntu/Debian
sudo apt-get install cmake build-essential

# macOS
brew install cmake
xcode-select --install

# Windows
# Download from https://cmake.org/download/

Installation Steps:

  1. Clone the repository:
git clone https://github.com/masasibata/t-one-rest-api.git
cd t-one-rest-api
  1. Install dependencies:
make install

This command will:

  • βœ… Check for required system dependencies (cmake)
  • βœ… Clone the T-one repository automatically
  • βœ… Install Poetry (if not already installed)
  • βœ… Install all Python dependencies including T-one

⏱️ Note: Installation may take several minutes when building the kenlm package. Be patient!

  1. (Optional) Install with Redis support:

For production deployments with multiple API instances:

make install-redis

πŸ’‘ Note: For single-instance deployments, the default memory storage is sufficient.

  1. Start the server:
make run
  1. Access the API:
  • 🌐 API: http://localhost:8000
  • πŸ“– Swagger UI: http://localhost:8000/docs
  • πŸ“˜ ReDoc: http://localhost:8000/redoc

⚑ Performance: For production use with multiple concurrent requests, see Performance Tuning section to configure multiple workers.


πŸ“– API Documentation

Interactive Documentation

Once the server is running, you can explore the API using:

  • Swagger UI (/docs) - Interactive API explorer with "Try it out" feature (protected with X-API-Key header if API_KEY is set)
  • ReDoc (/redoc) - Beautiful, responsive API documentation (protected with X-API-Key header if API_KEY is set)

API Authentication (Optional)

The API supports optional API key authentication. If API_KEY environment variable is set, all endpoints except /health and / require the key to be provided in the X-API-Key header.

Configuration:

# Set API key via environment variable
export API_KEY=your-secret-key-here

# Or in .env file
echo "API_KEY=your-secret-key-here" >> .env

Usage:

When API key is configured, include it in the X-API-Key header:

# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe" -F "file=@audio.wav"

# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe" \
     -H "X-API-Key: your-secret-key-here" \
     -F "file=@audio.wav"

Protected Endpoints:

  • POST /transcribe - Requires API key if configured
  • POST /transcribe/streaming - Requires API key if configured
  • POST /transcribe/streaming/chunk - Requires API key if configured
  • POST /transcribe/streaming/finalize - Requires API key if configured

Public Endpoints (always accessible):

  • GET / - API information
  • GET /health - Health check

Protected Documentation (if API_KEY is set):

  • GET /docs - Swagger UI (requires X-API-Key header)
  • GET /redoc - ReDoc documentation (requires X-API-Key header)

πŸ’‘ Note: If API_KEY is not set, the API works without authentication. This is useful for development or internal networks.

Endpoints Overview

Method Endpoint Description
GET / API information and available endpoints
GET /health Health check and model status
POST /transcribe Transcribe complete audio file (offline)
POST /transcribe/streaming Start streaming recognition session
POST /transcribe/streaming/chunk Process audio chunk in streaming mode
POST /transcribe/streaming/finalize Finalize streaming session

πŸ”Œ API Endpoints

GET / - API Information

Get information about the API and available endpoints.

Response:

{
  "name": "T-one ASR API",
  "version": "1.0.0",
  "description": "REST API for Russian speech recognition",
  "endpoints": {
    "POST /transcribe": "Transcribe speech from audio file (offline)",
    "POST /transcribe/streaming": "Start streaming recognition",
    "POST /transcribe/streaming/chunk": "Send audio chunk for streaming",
    "POST /transcribe/streaming/finalize": "Finalize streaming"
  }
}

GET /health - Health Check

Check API status and verify model is loaded.

Response:

{
  "status": "healthy",
  "model_loaded": true
}

POST /transcribe - Offline Recognition

Transcribe speech from a complete audio file.

Parameters:

  • file (file, required) - Audio file (WAV, FLAC, MP3, OGG, etc.)
  • return_timestamps (bool, optional) - Return timestamps with phrases (default: true)

Example Request:

curl -X POST "http://localhost:8000/transcribe" \
     -F "file=@audio.wav" \
     -F "return_timestamps=true"

Example Response:

{
  "phrases": [
    {
      "text": "ΠΏΡ€ΠΈΠ²Π΅Ρ‚",
      "start_time": 1.79,
      "end_time": 2.04
    },
    {
      "text": "это тСст",
      "start_time": 3.72,
      "end_time": 4.26
    }
  ],
  "full_text": "ΠΏΡ€ΠΈΠ²Π΅Ρ‚ это тСст",
  "duration": 4.26,
  "processing_time": 0.85
}

POST /transcribe/streaming - Start Streaming

Create a new streaming recognition session.

Headers:

  • X-API-Key (string, optional) - API key for authentication (required if API_KEY env var is set)

Example Request:

# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming"

# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming" \
     -H "X-API-Key: your-secret-key"

Response:

{
  "phrases": [],
  "state_id": "550e8400-e29b-41d4-a716-446655440000",
  "is_final": false
}

POST /transcribe/streaming/chunk - Process Chunk

Process an audio chunk in streaming mode.

Parameters:

  • state_id (string, required) - State ID from /transcribe/streaming
  • file (file, required) - Audio chunk

Headers:

  • X-API-Key (string, optional) - API key for authentication (required if API_KEY env var is set)

Example Request:

# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming/chunk" \
     -F "state_id=550e8400-e29b-41d4-a716-446655440000" \
     -F "file=@chunk.wav"

# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming/chunk" \
     -H "X-API-Key: your-secret-key" \
     -F "state_id=550e8400-e29b-41d4-a716-446655440000" \
     -F "file=@chunk.wav"

POST /transcribe/streaming/finalize - Finalize Streaming

Finalize streaming session and get final results.

Parameters:

  • state_id (string, required) - State ID from streaming session

Headers:

  • X-API-Key (string, optional) - API key for authentication (required if API_KEY env var is set)

Example Request:

# Without API key (if not configured)
curl -X POST "http://localhost:8000/transcribe/streaming/finalize" \
     -F "state_id=550e8400-e29b-41d4-a716-446655440000"

# With API key (if configured)
curl -X POST "http://localhost:8000/transcribe/streaming/finalize" \
     -H "X-API-Key: your-secret-key" \
     -F "state_id=550e8400-e29b-41d4-a716-446655440000"

πŸ’» Usage Examples

Python Example - Offline Recognition

import requests

# Transcribe audio file
with open("audio.wav", "rb") as f:
    response = requests.post(
        "http://localhost:8000/transcribe",
        files={"file": f},
        data={"return_timestamps": True}
    )
    result = response.json()

    print(f"Full text: {result['full_text']}")
    print(f"Processing time: {result['processing_time']:.2f} sec")

    for phrase in result["phrases"]:
        print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")

Python Example - Streaming Recognition

import requests
import os

# Optional: Get API key from environment
api_key = os.getenv("API_KEY")
headers = {"X-API-Key": api_key} if api_key else {}

# 1. Start streaming session
response = requests.post(
    "http://localhost:8000/transcribe/streaming",
    headers=headers
)
state_id = response.json()["state_id"]

# 2. Process audio chunks
chunk_files = ["chunk1.wav", "chunk2.wav", "chunk3.wav"]

for chunk_file in chunk_files:
    with open(chunk_file, "rb") as f:
        response = requests.post(
            "http://localhost:8000/transcribe/streaming/chunk",
            files={"file": f},
            data={"state_id": state_id},
            headers=headers
        )
        result = response.json()

        # Print recognized phrases
        for phrase in result["phrases"]:
            print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")

# 3. Finalize session
response = requests.post(
    "http://localhost:8000/transcribe/streaming/finalize",
    data={"state_id": state_id},
    headers=headers
)
final_result = response.json()
print("\nFinal phrases:")
for phrase in final_result["phrases"]:
    print(f"{phrase['start_time']:.2f}s - {phrase['end_time']:.2f}s: {phrase['text']}")

πŸ“ See asr_api/example_client.py for a complete working example.


πŸ› οΈ Makefile Commands

Command Description
make install Clone T-one and install dependencies (memory)
make install-redis Install with Redis support for distributed storage
make run Start the ASR API server (development, single worker)
make run-prod Start the ASR API server (production, multiple workers)
make docker-build Build Docker images
make docker-up Start services with docker-compose
make docker-down Stop docker-compose services
make docker-logs View docker-compose logs
make clean Remove T-one clone and cache files
make help Show available commands

🐳 Docker Deployment Details

Docker Compose Services

The docker-compose.yml includes three services:

Service Description Port Storage
api API with memory storage 8000 In-memory
api-redis API with Redis storage 8001 Redis
redis Redis server 6379 -

Quick Commands

# Start API with memory storage
docker compose up -d api

# Start API with Redis
docker compose up -d api-redis redis

# View logs
docker compose logs -f api

# Stop all services
docker compose down

# Rebuild images
docker compose build

Manual Docker Build

If you prefer to build and run manually:

Build images:

# Memory storage version
docker build -t t-one-rest-api .

# Redis version
docker build -f Dockerfile.redis -t t-one-rest-api:redis .

Run containers:

# Memory storage
docker run -d \
  --name t-one-api \
  -p 8000:8000 \
  -e LOG_LEVEL=INFO \
  t-one-rest-api

# Redis storage (requires Redis running)
docker run -d \
  --name t-one-api-redis \
  -p 8000:8000 \
  -e STORAGE_TYPE=redis \
  -e REDIS_URL=redis://host.docker.internal:6379/0 \
  t-one-rest-api:redis

Configuration

Environment Variables:

Variable Default Description
HOST 0.0.0.0 Server host
PORT 8000 Server port
WORKERS 1 Number of worker processes for parallel processing
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)
STORAGE_TYPE memory Storage type: memory or redis
REDIS_URL redis://localhost:6379/0 Redis connection URL
REDIS_KEY_PREFIX asr:session: Redis key prefix
SESSION_TIMEOUT_SECONDS 3600 Session timeout in seconds
MAX_FILE_SIZE_MB 100 Maximum file size in MB
API_KEY None (optional) Optional API key for authentication

Volumes:

  • model-cache - Caches downloaded models from HuggingFace (persists between restarts)
  • model-storage - Persistent storage for downloaded models (avoids re-downloading on rebuild)
  • redis-data - Persistent Redis data storage

Health Checks:

All services include automatic health checks:

  • API: GET /health endpoint
  • Redis: redis-cli ping command

πŸ”΄ Redis Storage (Optional)

For production deployments with multiple API instances or when you need persistent session storage, you can use Redis instead of in-memory storage.

Installation with Redis

# Install with Redis support
make install-redis

Or manually:

poetry install --extras redis

Configuration

Set environment variables to use Redis:

export STORAGE_TYPE=redis
export REDIS_URL=redis://localhost:6379/0
export REDIS_KEY_PREFIX=asr:session:

Or create a .env file:

STORAGE_TYPE=redis
REDIS_URL=redis://localhost:6379/0
REDIS_KEY_PREFIX=asr:session:

Running Redis

Docker (recommended):

docker run -d -p 6379:6379 redis:7-alpine

Local installation:

# Ubuntu/Debian
sudo apt-get install redis-server
sudo systemctl start redis

# macOS
brew install redis
brew services start redis

When to Use Redis

  • βœ… Multi-instance deployments - Share sessions across multiple API servers

  • βœ… High availability - Sessions survive server restarts

  • βœ… Horizontal scaling - Load balance requests across instances

  • βœ… Production environments - Better reliability and monitoring

  • ❌ Single instance - Memory storage is simpler and faster

  • ❌ Development/testing - No need for additional infrastructure

  • ❌ Low traffic - Memory storage is sufficient

Storage Comparison

Feature Memory Storage Redis Storage
Setup No additional setup Requires Redis server
Performance Fastest (in-process) Fast (network overhead)
Persistence Lost on restart Survives restarts
Multi-instance ❌ Not supported βœ… Supported
Scalability Single server only Horizontal scaling
Production Ready Limited βœ… Recommended

βš™οΈ Performance Tuning

Parallel Processing with Multiple Workers

The API supports parallel processing of multiple audio files simultaneously through multiple worker processes. This is especially useful when you need to process many files concurrently.

How It Works

  • Automatic ONNX Runtime Configuration: The API automatically patches ONNX Runtime to support parallel workers without modifying the T-one package. Each worker can process requests independently.
  • Worker Processes: Each worker runs in a separate process with its own model instance, allowing true parallel processing.
  • Load Distribution: Gunicorn automatically distributes incoming requests across available workers.

Configuring Workers

Docker (Recommended):

  1. Using .env file (easiest):

    # Create or edit .env file
    echo "WORKERS=4" >> .env
    
    # Start API
    docker compose up -d api
  2. Edit docker-compose.yml:

    services:
      api:
        environment:
          WORKERS: "4"  # Adjust based on your CPU cores

    Then run: docker compose up -d api

  3. Environment variable (docker-compose v2.20+):

    WORKERS=4 docker compose up -d api

Local Installation:

# Production mode with 4 workers
make run-prod WORKERS=4

# Or directly with gunicorn:
poetry run gunicorn asr_api.main:app \
  -w 4 \
  -k uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 300

Development Mode:

# Single worker with auto-reload (for development)
make run

Recommendations

CPU Cores Recommended Workers Notes
2-4 WORKERS=2-4 Use all cores
4-8 WORKERS=4-7 Leave 1 core for system
8-16 WORKERS=8-15 Leave 1-2 cores for system
16+ WORKERS=15-31 Leave 1-2 cores for system

Resource Considerations:

  • Memory: Each worker loads the model (~500MB-1GB RAM per worker)
  • CPU: More workers = better throughput, but diminishing returns after ~CPU cores
  • I/O: Multiple workers help when processing many files simultaneously

Performance Tips

  1. Start with CPU cores count: Set WORKERS equal to your CPU core count
  2. Monitor resource usage: Watch CPU and memory usage, adjust if needed
  3. Use Redis for multi-instance: When scaling across multiple servers, use Redis storage
  4. Test with your workload: Different audio file sizes may benefit from different worker counts

Verification

After starting with multiple workers, check logs to verify:

# Docker
docker compose logs api | grep "Booting worker"

# Should show multiple workers:
# [INFO] Booting worker with pid: 12
# [INFO] Booting worker with pid: 13
# [INFO] Booting worker with pid: 14
# [INFO] Booting worker with pid: 15

You should also see:

INFO: ONNX Runtime patch applied successfully for parallel workers

This confirms that parallel processing is enabled.

Example: Processing Multiple Files

With 4 workers, you can process 4 audio files simultaneously:

import asyncio
import aiohttp

async def transcribe_file(session, url, file_path):
    with open(file_path, 'rb') as f:
        data = aiohttp.FormData()
        data.add_field('file', f, filename='audio.wav')
        async with session.post(url, data=data) as resp:
            return await resp.json()

async def main():
    url = "http://localhost:8000/transcribe"
    files = ["file1.wav", "file2.wav", "file3.wav", "file4.wav"]
    
    async with aiohttp.ClientSession() as session:
        # Process all files in parallel
        results = await asyncio.gather(*[
            transcribe_file(session, url, f) for f in files
        ])
    
    for result in results:
        print(result['full_text'])

asyncio.run(main())

With 4 workers, all 4 files will be processed simultaneously, significantly reducing total processing time.


πŸ› Troubleshooting

Network Timeout Errors

If you see TimeoutError or Read timed out during installation:

# Simply retry the installation
make install

kenlm Installation Fails

If kenlm fails to install repeatedly:

# Install kenlm separately
poetry run pip install kenlm
poetry install

Missing Build Tools

If you get compilation errors:

# Ubuntu/Debian
sudo apt-get install build-essential cmake

# macOS
xcode-select --install
brew install cmake

Model Loading Issues

If the model fails to load on startup:

  • Check your internet connection
  • Verify DNS resolution: ping huggingface.co
  • Check firewall settings
  • Review server logs for detailed error messages

πŸ“Š Model Information

Property Value
Model T-one (71M parameters)
Architecture Conformer (CTC-based)
Language Russian
Domain Telephony (call center optimized)
Quality (WER) 8.63% on call-center data
Model Size ~71MB (auto-downloaded)
Latency Low (optimized for real-time)

🎡 Supported Audio Formats

The API supports all audio formats that can be read by librosa/soundfile:

  • βœ… WAV
  • βœ… FLAC
  • βœ… MP3
  • βœ… OGG
  • βœ… And more...

Audio is automatically converted to mono 8kHz format for processing.


πŸ“ Project Structure

t-one-rest-api/
β”œβ”€β”€ asr_api/                 # Main API package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”œβ”€β”€ config.py            # Configuration management
β”‚   β”œβ”€β”€ models.py            # Pydantic data models
β”‚   β”œβ”€β”€ audio_processor.py   # Audio processing & T-one integration
β”‚   β”œβ”€β”€ onnx_patch.py        # ONNX Runtime patch for parallel workers
β”‚   β”œβ”€β”€ routers/             # API route handlers
β”‚   β”œβ”€β”€ services/            # Business logic services
β”‚   β”œβ”€β”€ storage/             # Storage backends (memory/redis)
β”‚   └── utils/               # Utility functions
β”œβ”€β”€ T-one/                   # T-one repository (auto-cloned)
β”œβ”€β”€ pyproject.toml           # Poetry configuration
β”œβ”€β”€ Makefile                 # Build automation
β”œβ”€β”€ Dockerfile               # Docker image for memory storage
β”œβ”€β”€ Dockerfile.redis         # Docker image for Redis storage
β”œβ”€β”€ docker-compose.yml       # Docker Compose configuration
└── README.md                # This file

πŸ§‘β€πŸ’» Development

Adding Dependencies

The project uses Poetry for dependency management:

# Add a new dependency
poetry add <package-name>

# Add a development dependency
poetry add --group dev <package-name>

Running in Development Mode

The server runs with auto-reload enabled by default:

make run
# or
poetry run uvicorn asr_api.main:app --host 0.0.0.0 --port 8000 --reload

πŸš€ Production Deployment

Recommendations

  1. State Storage - Use Redis storage for multi-instance deployments:

    # Install with Redis
    make install-redis
    
    # Configure
    export STORAGE_TYPE=redis
    export REDIS_URL=redis://your-redis-host:6379/0
  2. CORS - Configure specific domains instead of "*" in CORSMiddleware

  3. Authentication - Add JWT tokens or API keys to protect endpoints

  4. Rate Limiting - Implement request rate limiting per client

  5. Logging - Set up centralized logging (ELK, Loki, etc.)

  6. Monitoring - Add metrics (Prometheus/Grafana)

  7. Reverse Proxy - Use nginx for load balancing and SSL

  8. File Size Limits - Configure maximum file size restrictions (default: 100MB)


πŸ“„ License

This project is licensed under the MIT License.

The T-one model is licensed under the Apache 2.0 License.


πŸ”— Links


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Made with ❀️ for Russian speech recognition

About

Production-ready REST API for Russian speech recognition using T-one model. FastAPI-based service with offline and streaming transcription support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages