Emotion recognition is the process of identifying human emotions using AI. This project seeks to recognise emotions from speech clips (audio + video). Generally, the technology works best at this task if it uses multiple modalities, for this reason we implemented a two-stream model to analyze facial expressions from video and voice tone from audio. These tasks are called Speech Emotion Recognition (SER) and Facial Emotion Recognition (FER) respectively.
In this project we trained the models on speech clips of RAVDESS dataset which contains 8 emotion classes: neutral, calm, happy, sad, angry, fearful, surprise and disgust (7 used).
More details about the processing and architecture in Project_Slides.pdf. Dimostrative video and deployed model in the DEMO folder.
Emotion-Recognition_SER-FER_RAVDESS
├───Datasets
│ ├───RAVDESS
│ ├───RAVDESS_audio
│ ├───RAVDESS_frames
│ ├───RAVDESS_frames_black
│ └───RAVDESS_frames_face_BW
├───DEMO
│ ├───Examples
│ ├───ER DEMO.mp4
│ └───ER_FullClip_DEMO.ipynb
├───Models
│ ├───Audio Stream
│ └───Video Stream
├───Other
│ └───haarcascade_frontalface_default.xml
├───Plots
├───StreamAudio_1D.ipynb
├───StreamAudio_2D.ipynb
├───FullClip_Test.ipynb
├───StreamVideo_FaceOnly.ipynb
├───StreamVideo_FramesExtraction.ipynb
├───StreamVideo_FullFrame.ipynb
├───StreamVideo_Test.ipynb
├───Project_Slides.pdf
├───README.md
├───LICENSE.md
└───requirements.txt
To classify emotions (using our trained model):
- Copy your clips in
DEMO/Examples - Run
ER_FullClip_DEMO.ipynbin DEMO folder
To replicate this project (training and inference):
- Download the speech clips of RAVDESS dataset and save it in
Datasets/RAVDESSfolder - Train video and audio models
- Video Stream: extract frames with
StreamVideo_FramesExtraction.ipynb(multiple type of frames are generated -> best are "224x224 only faces BW"), train model withStreamVideo_FullFrame.ipynbandStreamVideo_FaceOnly.ipynb(depending on the frames generated) and test the results withStreamVideo_Test.ipynb - Audio Stream: use
StreamAudio_1D.ipynbandStreamAudio_2D.ipynbto train models (2D works better)
- Video Stream: extract frames with
- Use
FullClip_Test.ipynbto assess global performance - Use
ER_FullClip_DEMO.ipynbin DEMO folder to classify videos with the trained models.
