SPEECH EMOTION RECOGNITION

Customer who is into automotive component manufacturing wanted to build a multi-modal distress recognition system for passenger transportation industry.

When a passenger is distressed or felt threatened due to behavior of Driver or in the event of Driver misbehaving with the passenger, the gateway device on car capturing the continuous audio stream analyze the audio and recognize the stress signal in the form of voice emotion detection and then triggers an event to server.

Server-side components, will further analyze the video -5 to +5 seconds and with the help of Human action recognition and classification model will analyze the thread level and further escalates to security personnel or according to the rules set at the notification engine

In parallel, the server-side component will send a notification to parent or guardian to get on to live view to see if the passenger is threatened and in real trouble

Identification of the basic emotions like Anger, Happiness, Sad, Neutral and Fear

Trigger a notification to the server in case negative emotions are detected in the conversations between the occupants in the transport vehicle

TECHNOLOGIES / TOOLS

Keras, TensorFlow

Librosa

Mel Coefficients & Mel Spectrograms as Audio Features

MODEL HIGHLIGHTS

Used Hierarchical classifier to classify the EMOTION

Multiple Dataset like RAVDESS, SAVEE, IEMOCAP, we finally used RAVDESS only since the model performance with all three datasets was very bad

Successfully ported the model onto Platforms like QUALCOMM QC S 603 & 605