电信科学 ›› 2019, Vol. 35 ›› Issue (7): 100-108.doi: 10.11959/j.issn.1000-0801.2019052

• 研究与开发 • 上一篇    下一篇

基于语谱图提取深度空间注意特征的语音情感识别算法

王金华,应娜,朱辰都,刘兆森,蔡哲栋   

  1. 杭州电子科技大学,浙江 杭州 310018
  • 修回日期:2019-03-06 出版日期:2019-07-20 发布日期:2019-07-22
  • 作者简介:王金华(1992- ),女,杭州电子科技大学硕士生,主要研究方向为深度学习与语音处理。|应娜(1978- ),女,博士,杭州电子科技大学副教授、硕士生导师,主要研究方向为信号处理与人工智能。|朱辰都(1995- ),男,杭州电子科技大学硕士生,主要研究方向为语音处理。|刘兆森(1995- ),男,杭州电子科技大学硕士生,主要研究方向为深度学习与图像处理。|蔡哲栋(1994- ),男,杭州电子科技大学硕士生,主要研究方向为深度学习与图像处理。
  • 基金资助:
    国家自然科学基金资助项目(61705055);浙江省自然科学基金资助项目(LY16F010013)

Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature

Jinhua WANG,Na YING,Chendu ZHU,Zhaosen LIU,Zhedong CAI   

  1. Hangzhou Dianzi University, Hangzhou 310018, China
  • Revised:2019-03-06 Online:2019-07-20 Published:2019-07-22
  • Supported by:
    The National Natural Science Foundation of China(61705055);The Natural Science Foundation of Zhejiang Province of China(LY16F010013)

摘要:

从语音情感特征的提取和分类建模出发,以混合卷积神经网络模型为基础,改进特征提取中的 Itti模型,包括:增加通过局部二值模式提取的纹理特征;结合听觉敏感度权重提取情感强相关特征。然后提出通过特征约束条件提取标定权重特征的约束挤压和激励网络结构;最后形成以 VGGnet 和长短时记忆网络混合网络为基础的微调模型,进一步提升了情感表征能力。通过在自然情感数据库和柏林德语数据库上进行验证,该模型在情感识别率上有明显的上升,相较于基准模型提升了 8. 43%,同时对比了本模型在自然数据库(FAU-AEC)和柏林数据库(EMO-DB)上的识别效果,实验结果证明模型具有良好的泛化性。

关键词: 情感识别, 深度混合神经网络模型, 视觉注意机制

Abstract:

Starts from the extraction and classification modeling of speech emotion features, based on the hybrid convolutional neural network model, the Itti model in feature extraction was improved, including increasing the extraction by local binary mode. The strong correlation features were extracted combining with the sensitivity of the auditory sensitivity. Then, the constrained extrusion and excitation network structure of the calibration weights were extracted by feature constraints. Finally, a fine-tuning model based on VGGnet and long-short-time memory network hybrid network was formed, further enhancing the ability to express emotions. By validating on the natural sentiment database and the German-German database, the model had a significant increase in the rate of sentiment recognition, which is 8. 43% higher than the benchmark model. At the same time, the recognition effect of the model on the natural database (FAU-AEC) and the Berlin database (EMO-DB) were compared. The experimental results show that the model has a good generalization.

Key words: emotion recognition, deep hybrid neural network model, visual attention mechanism

中图分类号: 

No Suggested Reading articles found!