A multilingual real-time speech translation system supporting bidirectional translation between 99 languages with context-aware ASR correction.
- 99 Languages: Bidirectional translation supporting 9,801 translation directions
- Context-Aware Correction: LLM-powered ASR error correction using temporal context to fix boundary artifacts
- Voice Activity Detection: Reduces hallucinations by removing silence segments
- Optimized Streaming: 2-second chunks with 0.5-second overlap for continuous processing
- Modular Architecture: Cascaded pipeline allowing component-wise optimization
The system implements a 5-component pipeline:
- Voice Activity Detection (pyannote/voice-activity-detection)
- Automatic Speech Recognition (deepdml/faster-whisper-large-v3-turbo-ct2)
- Context-Aware ASR Correction (GPT-4o-mini)
- Jaccard Validation
- Machine Translation (vLLM-served cpatonn/Qwen3-4B-Instruct-2507-AWQ-4bit)
Before setting up the project, ensure you have:
- Python 3.11 (required)
- GPU with 6-10 GB VRAM (to run VAD + ASR + MT models)
- OpenAI API Key (required for ASR Correction)
- ~80 GB free storage (for models, dependencies, and Docker images)
Note: This project was developed and tested on Windows. You may encounter issues when running on other operating systems.
- Clone repository
git clone https://github.com/as4193/TarjamaRTv1.git
cd TarjamaRTv1- Install PyTorch with CUDA support
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124- Install remaining requirements
pip install -r requirements.txt- Set OpenAI API key
Windows (PowerShell):
$env:OPENAI_API_KEY="your_openai_key_here"Linux/Mac:
export OPENAI_API_KEY=your_openai_key_here- Login to Hugging Face (for gated models)
huggingface-cli loginEnter your Hugging Face token when prompted. This is required for accessing gated models like Pyannote.
- Run vLLM with OpenAI
docker pull vllm/vllm-openai:latest
docker-compose up -d
#You should be in vllm_service folder Note: This step may take 10-20 minutes depending on your internet speed, as the model will be downloaded from Hugging Face and then loaded into GPU.
- Start Streamlit app
streamlit run project_ui.py- Open browser
# Navigate to http://localhost:8501