Skip to main navigation Skip to search Skip to main content

Deep Learning for Audio Visual Emotion Recognition

  • T. Hussain
  • , W. Wang
  • , N. Bouaynaya
  • , H. Fathallah-Shaykh
  • , L. Mihaylova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Human emotions can be presented in data with multiple modalities, e.g. video, audio and text. An automated system for emotion recognition needs to consider a number of challenging issues, including feature extraction, and dealing with variations and noise in data. Deep learning have been extensively used recently, offering excellent performance in emotion recognition. This work presents a new method based on audio and visual modalities, where visual cues facilitate the detection of the speech or non-speech frames and the emotional state of the speaker. Different from previous works, we propose the use of novel speech features, e.g. the Wavegram, which is extracted with a one-dimensional Convolutional Neural Network (CNN) learned directly from time-domain waveforms, and Wavegram-Logmel features which combines the Wavegram with the log mel spectrogram. The system is then trained in an end-to-end fashion on the SAVEE database by also taking advantage of the correlations among each of the streams. It is shown that the proposed approach outperforms the traditional and state-of-the art deep learning based approaches, built separately on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions.

Original languageAmerican English
Title of host publication2022 25th International Conference on Information Fusion, FUSION 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781737749721
DOIs
StatePublished - 2022
Event25th International Conference on Information Fusion, FUSION 2022 - Linkoping, Sweden
Duration: Jul 4 2022Jul 7 2022

Publication series

Name2022 25th International Conference on Information Fusion, FUSION 2022

Conference

Conference25th International Conference on Information Fusion, FUSION 2022
Country/TerritorySweden
CityLinkoping
Period7/4/227/7/22

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Information Systems
  • Signal Processing
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Deep Learning for Audio Visual Emotion Recognition'. Together they form a unique fingerprint.

Cite this