基于语谱图提取深度空间注意特征的语音情感识别算法

doi:10.11959/j.issn.1000-0801.2019052

Abstract

Abstract:

Starts from the extraction and classification modeling of speech emotion features, based on the hybrid convolutional neural network model, the Itti model in feature extraction was improved, including increasing the extraction by local binary mode. The strong correlation features were extracted combining with the sensitivity of the auditory sensitivity. Then, the constrained extrusion and excitation network structure of the calibration weights were extracted by feature constraints. Finally, a fine-tuning model based on VGGnet and long-short-time memory network hybrid network was formed, further enhancing the ability to express emotions. By validating on the natural sentiment database and the German-German database, the model had a significant increase in the rate of sentiment recognition, which is 8. 43% higher than the benchmark model. At the same time, the recognition effect of the model on the natural database (FAU-AEC) and the Berlin database (EMO-DB) were compared. The experimental results show that the model has a good generalization.

Key words: emotion recognition, deep hybrid neural network model, visual attention mechanism

CLC Number:

TP18

Jinhua WANG,Na YING,Chendu ZHU,Zhaosen LIU,Zhedong CAI. Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature[J]. Telecommunications Science, 2019, 35(7): 100-108.

Figures/Tables 8

References 18

[1]	韩文静, 李海峰, 阮华斌 , 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50.
	HAN W J , LI H F , RUAN H B , et al. A review of research progress in speech emotion recognition[J]. Journal of Software, 2014, 25(1): 37-50.
[2]	王海坤, 潘嘉, 刘聪 . 语音识别技术的研究进展与展望[J]. 电信科学, 2018, 34(2): 1-11.
	WANG H K , PAN J , LIU C . Research progress and prospect of speech recognition technology[J]. Telecommunications Science, 2018, 34(2): 1-11.
[3]	YAMADA T , HASHIMOTO H , TOSA N . Pattern recognition of emotion with neural network[C]// The 1995 IEEE IECON 21st International Conference on Industrial Electronics,Control,and Instrumentation,Nov 6-10,1995,Orlando,FL,USA. Piscataway:IEEE Press, 1995: 183-187.
[4]	TENG Z , JI W . Speech emotion recognition with i-vector feature and rnn model[C]// 2015 IEEE China Summit and International Conference on Signal and Information Processing (China SIP),July 12-15,2015,Chengdu,China. Piscataway:IEEE Press, 2015: 524-528.
[5]	BASU A , CHAKRABORTY J , AFTABUDDIN M . Emotion recognition from speech using convolutional neural network with recurrent neural network architecture[C]// 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA),Dec 13-16,2016,Jeju,South Korea. Piscataway:IEEE Press, 2017: 333-336.
[6]	SHI B , BAI X , YAO C . An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11).
[7]	ZAZO R , LOZANO-DIEZ A , GONZALEZ D J , et al. Language identification in short utterances using long short-term memory (LSTM)[J]. Recurrent Neural Networks, 2016(1).
[8]	GELLY G , GAUVAIN J L , LE V , et al. A divide-and-conquer approach for language identification based on recurrent neural networks[Z]. 2016.
[9]	LOZANO-DIEZ A , ZAZO C R , GONZLEZ D J , et al. An end-to-end approach to language identification in short utterances using convolutional neural networks[J]. 2015.
[10]	ZHANG X R , SONG P , ZHA C , et al. Auditory attention model based on Chirplet for cross-corpus speech emotion recognition[J]. Journal of Southeast University, 2016, 32(4): 402-407.
[11]	纪滨, 杨盼盼, 申元霞 . 基于改进ITTI模型及粒子群优化算法的白细胞区域提取[J]. 安徽工业大学学报, 2016, 33(3): 284-288.
	JI B , YANG P P , SHEN Y X . Leukocyte region extraction based on improved ITTI model and particle swarm optimization algorithm[J]. Journal of Anhui University of Technology, 2016, 33(3): 284-288.
[12]	刘兵, 霍键亮 . 基于灰度概率统计的视觉注意改进算法[J]. 电子设计工程, 2013, 21(5): 54-57.
	LIU B , HUO J L . Improved visual attention algorithm based on gray probability statistics[J]. Electronic Design Engineering, 2013, 21(5): 54-57.
[13]	KALINLI O , CHEN R . Speech syllable/vowel /phone boundary detection using auditory attention cues: US20120253812[P].2016-02-02.
[14]	STEVENS C , HARN B , CHARD D J , et al. Examining the role of attention and instruction in at-risk kind ergarteners electrophysiological measures of selective auditory attention before and after an early literacy intervention[J]. Journal of Learning Disabilities, 2013, 46(1): 73-86.
[15]	张欣然, 巨晓正, 宋鹏 , 等. 用于垮库语音情感识别的 DBN特征融合方法[J]. 信号处理, 2017, 33(5): 649-650.
	ZHANG X R , JU X Z , SONG P , et al. DBN feature fusion method for voice emotion recognition in library[J]. Signal Processing, 2017, 33(5): 649-650.
[16]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[J]. arXiv: 1709.01507, 2017.
[17]	EYBEN F , WOLLMER M , SCHULLER B . openSMILE—the Munich versatile and fast open-source audio feature extractor[C]// The 18th ACM International Conference on Multimedia,October 25-29,2010,Firenze,Italy. New York:ACM Press, 2010: 1459-1462.
[18]	BARTZ C , HEROLD T , HAOJIN Y , et al. Language identification using deep convolutional recurrent neural networks[J]. arXiv: 1708.04811v1, 2017.

Metrics

Recommended 0

No Suggested Reading articles found!

情绪类别	CRNN	CRNN_CSEnet	CRNN_ISEnet
愤怒	67.89%	68%	64.5%
强调	47.56%	70.54%	80.17%
中性	89.16%	84.88%	90.94%
高兴	74.69%	75.79%	74.64%
其他	35.84%	58.06%	58.92%
平均识别率	63.02%	71.45%	73.83%

情绪分类	CRNNⅠ	CRNNⅡ	CRNN_ASEnet
愤怒	53.63%	67.89%	64.5%
强调	40.56%	47.56%	80.17%
中性	76.71%	89.16%	90.94%
高兴	60.08%	74.69%	74.64%
其他	34.84%	35.84%	58.92%
平均识别率	53.16%	63.02%	73.83%

情绪分类	中性	恐惧	厌恶	高兴	烦躁	难过	愤怒
中性	75.27%	16.13%	1.08%	1.79%	2.15%	3.23%	0.36%
恐惧	0.75%	80.05%	1.13%	1.12%	5.64%	1.88%	9.40%
厌恶	0.62%	7.41%	82.10%	1.23%	5.56%	2.47%	0.62%
高兴	3.39%	1.69%	3.87%	85.71%	0.48%	0	4.84%
烦躁	7.40%	9.63%	0	2.22%	78.51%	3.70%	5.18%
难过	4.88%	2.93%	4.88%	3.41%	1.46%	80.49%	1.95%
愤怒	0.93%	1.40%	6.98%	6.97%	3.72%	2.32%	77.67%

Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 18

Related Articles 1

Metrics

Recommended 0