Engineering Faculty Publications

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Ziping Zhao
Yu Zheng
Zixing Zhang
Haishuai Wang, Fairfield UniversityFollow
Yiqin Zhao
Chao Li

Document Type

Conference Proceeding

Article Version

Publisher's PDF

Publication Date

2018

Abstract

Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms.

Comments

The final publisher PDF has been archived here with permission from the copyright holder.

Publication Title

Interspeech

Repository Citation

Zhao, Ziping; Zheng, Yu; Zhang, Zixing; Wang, Haishuai; Zhao, Yiqin; and Li, Chao, "Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition" (2018). Engineering Faculty Publications. 181.
https://digitalcommons.fairfield.edu/engineering-facultypubs/181

Published Citation

Ziping Zhao, Yu Zheng, Zixing Zhang, Haishuai Wang, Yiqin Zhao and Chao Li. Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition. Interspeech, 272-276, 2018. DOI: 10.21437/Interspeech.2018-1477

DOI

10.21437/Interspeech.2018-1477

Download

Find in your library

COinS

Engineering Faculty Publications

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Document Type

Article Version

Publication Date

Abstract

Comments

Publication Title

Repository Citation

Published Citation

DOI

Browse

Author Corner

Related Links

Engineering Faculty Publications

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Authors

Document Type

Article Version

Publication Date

Abstract

Comments

Publication Title

Repository Citation

Published Citation

DOI

Share

Browse

Author Corner

Related Links