English | 简体中文
A local-first speech emotion recognition toolkit with training, CLI inference, and a FastAPI WebUI.
- RAVDESS dataset preprocessing and training
- Transformer Encoder speech emotion classification model
- Single-file CLI inference
- FastAPI WebUI
- Browser microphone recording, audio uploads, and probability display
src/
transformer_mood/
__init__.py
main.py
speech_emotion_classifier.py
templates/
index.html
static/
.gitkeep
README.zh.md # Chinese project README
output/ # Runtime output directory (ignored except .gitkeep)
data/README.md # Dataset placement notes
data/README.zh.md # Chinese dataset placement notes
transformer-md/ # Reference materials
requirements.txt # Non-PyTorch Python dependencies
requirements-webui.txt # Minimal extra dependencies for the WebUI
python3 -m venv .venv
source .venv/bin/activateInstall torch and torchaudio first, then install the remaining dependencies:
pip install torch torchaudio
pip install -r requirements.txtInstall ffmpeg:
sudo apt update
sudo apt install -y ffmpegThe recommended project entrypoints are:
python run.py doctor
python run.pyThis repository does not include the RAVDESS dataset itself. Download it manually and place it under:
data/ravdess/
See data/README.md for details.
python run.py train
python run.py train -- --dataset tess--dataset tess keeps the old CLI name, but now reads training audio from data/vec/ instead of data/tess/.
The vec-backed tess mode uses 6 classes:
- angry
- disgust
- fearful
- happy
- neutral
- sad
python run.py predict --audio path/to/example.wavpython run.py
python run.py webui --host 127.0.0.1 --port 8000Open:
http://127.0.0.1:8000
The WebUI supports:
- Uploading local audio files
- Recording from the browser microphone
- Displaying predicted emotion, confidence, and full probability distribution
Prediction requires a local checkpoint at output/best_model.pth, or an explicit EMOTION_MODEL_PATH environment variable.
- This project is released under the MIT License. See
LICENSEfor details. data/ravdess/is in.gitignoreso the raw dataset will not be committed accidentally- Legacy root-level model and image artifacts are ignored; the current expected outputs live in
output/ output/is kept as a directory boundary, but generated model files and figures are ignored for the public repository