Transformer Mood

A local-first speech emotion recognition toolkit with training, CLI inference, and a FastAPI WebUI.

Features

RAVDESS dataset preprocessing and training
Transformer Encoder speech emotion classification model
Single-file CLI inference
FastAPI WebUI
Browser microphone recording, audio uploads, and probability display

Repository Layout

src/
  transformer_mood/
    __init__.py
    main.py
    speech_emotion_classifier.py
    templates/
      index.html
    static/
      .gitkeep
README.zh.md                    # Chinese project README
output/                        # Runtime output directory (ignored except .gitkeep)
data/README.md                 # Dataset placement notes
data/README.zh.md              # Chinese dataset placement notes
transformer-md/                # Reference materials
requirements.txt               # Non-PyTorch Python dependencies
requirements-webui.txt         # Minimal extra dependencies for the WebUI

Quick Start

python3 -m venv .venv
source .venv/bin/activate

Install torch and torchaudio first, then install the remaining dependencies:

pip install torch torchaudio
pip install -r requirements.txt

Install ffmpeg:

sudo apt update
sudo apt install -y ffmpeg

The recommended project entrypoints are:

python run.py doctor
python run.py

Dataset

This repository does not include the RAVDESS dataset itself. Download it manually and place it under:

data/ravdess/

See data/README.md for details.

Training

python run.py train
python run.py train -- --dataset tess

--dataset tess keeps the old CLI name, but now reads training audio from data/vec/ instead of data/tess/.

The vec-backed tess mode uses 6 classes:

angry
disgust
fearful
happy
neutral
sad

CLI Prediction

python run.py predict --audio path/to/example.wav

WebUI

python run.py
python run.py webui --host 127.0.0.1 --port 8000

Open:

http://127.0.0.1:8000

The WebUI supports:

Uploading local audio files
Recording from the browser microphone
Displaying predicted emotion, confidence, and full probability distribution

Prediction requires a local checkpoint at output/best_model.pth, or an explicit EMOTION_MODEL_PATH environment variable.

Notes

This project is released under the MIT License. See LICENSE for details.
data/ravdess/ is in .gitignore so the raw dataset will not be committed accidentally
Legacy root-level model and image artifacts are ignored; the current expected outputs live in output/
output/ is kept as a directory boundary, but generated model files and figures are ignored for the public repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Mood

Features

Repository Layout

Quick Start

Dataset

Training

CLI Prediction

WebUI

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
docs		docs
output		output
src/transformer_mood		src/transformer_mood
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
requirements-webui.txt		requirements-webui.txt
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

Transformer Mood

Features

Repository Layout

Quick Start

Dataset

Training

CLI Prediction

WebUI

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages