通信学报 ›› 2022, Vol. 43 ›› Issue (12): 211-221.doi: 10.11959/j.issn.1000-436x.2022234
龙华, 黄张衡, 邵玉斌, 杜庆治, 苏树盟
修回日期:
2022-11-30
出版日期:
2022-12-25
发布日期:
2022-12-01
作者简介:
龙华(1963- ),女,回族,云南大理人,博士,昆明理工大学教授,主要研究方向为无线网络及音频信号处理、语种识别等基金资助:
Hua LONG, Zhangheng HUANG, Yubin SHAO, Qingzhi DU, Shumeng SU
Revised:
2022-11-30
Online:
2022-12-25
Published:
2022-12-01
Supported by:
摘要:
针对在低信噪比下语种识别准确率低的问题,提出一种基于分数阶小波变换的语种识别算法。首先,在特征提取前端采用自适应滤波法对带噪信号进行噪声滤除,以减小噪声对特征提取的影响,提升系统对带噪信号的处理能力。其次,采用新型分数阶小波变换作为小波基函数来模拟信号在耳蜗基底膜上的传播过程,利用非线性幂函数对信号进行压缩处理。最后,通过模拟人耳听觉过程提取改进耳蜗滤波器倒谱系数(CFCC)。实验结果表明,改进CFCC与传统CFCC相比显著提升了语种识别准确率,在0 dB信噪比下语种识别准确率平均提升了11.1%,充分验证了所提算法的有效性和稳健性。
中图分类号:
龙华, 黄张衡, 邵玉斌, 杜庆治, 苏树盟. 基于改进CFCC特征提取的语种识别算法研究[J]. 通信学报, 2022, 43(12): 211-221.
Hua LONG, Zhangheng HUANG, Yubin SHAO, Qingzhi DU, Shumeng SU. Research on language recognition algorithm based on improved CFCC feature extraction[J]. Journal on Communications, 2022, 43(12): 211-221.
表3
不同听觉特性函数识别准确率"
特征函数 | 听觉特性函数 | 识别准确率 | 平均识别准确率 | ||||
-5 dB | 0 dB | 5 dB | 10 dB | 15 dB | |||
CFCC | 13 | 66.8% | 70.76% | 72.77% | 74.16% | 79.34% | 72.77% |
LCFCC | 对数 | 63.73% | 68.6% | 74.83% | 75.06% | 78.7% | 72.18% |
CFCC0 | 0.101 | 67.73% | 71.46% | 75.36% | 80.96% | 83.46% | 75.79% |
CFCC1 | 115 | 65.63% | 73.8% | 76.86% | 78.43% | 80.1% | 74.96% |
FCFCC | 0.25 | 68.97% | 73.4% | 77.5% | 80.36% | 84.63% | 76.97% |
表7
不同特征在不同分类网络中的语种识别准确率"
特征 | 分类网络 | 识别准确率 | 平均识别准确率 | ||||
-5 dB | 0 dB | 5 dB | 10 dB | 15 dB | |||
NFCFCCAF | FcaNet-MobileNetV2 | 75.5% | 81.87% | 83.66% | 85.8% | 88.42% | 83.05% |
MobileNetV2 | 73.62% | 79.7% | 82.33% | 84.37% | 85.2% | 81.04% | |
ResNet | 74.2% | 80.36% | 81.9% | 84.5% | 85.63% | 81.30% | |
NFCFCCAF-DS | FcaNet-MobileNetV2 | 81.2% | 82.97% | 85.93% | 87.8% | 90.37% | 85.65% |
MobileNetV2 | 79.5% | 80.63% | 83.82% | 85.97% | 88.1% | 83.60% | |
ResNet | 77.16% | 80.2% | 81.2% | 84.54% | 88.25% | 82.27% |
[1] | IRTZA S , SETHU V , AMBIKAIRAJAH E ,et al. Using language cluster models in hierarchical language identification[J]. Speech Communication, 2018,100: 30-40. |
[2] | 苗晓晓, 徐及, 王剑 . 基于降噪自动编码器的语种特征补偿方法[J]. 计算机研究与发展, 2019,56(5): 1082-1091. |
MIAO X X , XU J , WANG J . Denoising auto encoder-based language feature compensation[J]. Journal of Computer Research and Development, 2019,56(5): 1082-1091. | |
[3] | DAVIS S , MERMELSTEIN P . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1980,28(4): 357-366. |
[4] | 龙华, 杨明亮, 邵玉斌 . 基于特征流融合的带噪语音检测算法[J]. 通信学报, 2020,41(4): 134-142. |
LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion[J]. Journal on Communications, 2020,41(4): 134-142. | |
[5] | QI J , WANG D , JIANG Y ,et al. Auditory features based on Gammatone filters for robust speech recognition[C]// Proceedings of 2013 IEEE International Symposium on Circuits and Systems. Piscataway:IEEE Press, 2013: 305-308. |
[6] | LI Q , HUANG Y . Robust speaker identification using an auditory-based feature[C]// Proceedings of 2010 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2010: 4514-4517. |
[7] | LI Q , HUANG Y . An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,19(6): 1791-1801. |
[8] | 刘影, 韩康康, 钱志鸿 . 基于声音空间梯度的高稳健性击键识别方法[J]. 通信学报, 2020,41(5): 96-103. |
LIU Y , HAN K K , QIAN Z H . High-roubustness keystroke recognition method based on acoustic spatial gradient[J]. Journal on Communications, 2020,41(5): 96-103. | |
[9] | 李晶皎, 安冬, 杨丹 ,等. 噪声环境下说话人识别的TEO-CFCC特征参数提取方法[J]. 计算机科学, 2012,39(12): 195-197. |
LI J J , AN D , YANG D ,et al. TEO-CFCC characteristic parameter extraction method for speaker recognition in noisy environments[J]. Computer Science, 2012,39(12): 195-197. | |
[10] | 李作强, 高勇 . 基于CFCC和相位信息的鲁棒性说话人辨识[J]. 计算机工程与应用, 2015,51(17): 228-232. |
LI Z Q , GAO Y . Robust speaker identification based on CFCC and phase information[J]. Computer Engineering and Applications, 2015,51(17): 228-232. | |
[11] | PATEL T B , PATIL H A . Cochlear filter and instantaneous frequency based features for spoofed speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(4): 618-631. |
[12] | 白静, 史燕燕, 薛珮芸 ,等. 融合非线性幂函数和谱减法的 CFCC特征提取[J]. 西安电子科技大学学报, 2019,46(1): 86-92. |
BAI J , SHI Y Y , XUE P Y ,et al. CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction[J]. Journal of Xidian University, 2019,46(1): 86-92. | |
[13] | 吴龙文, 聂雨亭, 张宇鹏 ,等. 基于变分模态分解的自适应滤波降噪方法[J]. 电子学报, 2021,49(8): 1457-1465. |
WU L W , NIE Y T , ZHANG Y P ,et al. An adaptive filtering denoising method based on variational mode decomposition[J]. Acta Electronica Sinica, 2021,49(8): 1457-1465. | |
[14] | GUO Y,etal . Novel fractional wavelet transform:principles,MRA and application[J]. Digital Signal Processing, 2021,110:102937. |
[15] | IRINO T , PATTERSON R D . A dynamic compressive gammachirp auditory filterbank[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2006,14(6): 2222-2232. |
[16] | SHAO Y , JIN Z Z , WANG D L ,et al. An auditory-based feature for robust speech recognition[C]// Proceedings of 2009 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2009: 4625-4628. |
[17] | LV H , SHAN P F , SHI H F ,et al. An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement[J]. Signal,Image and Video Processing, 2022,16(8): 2231-2237. |
[18] | 史军, 张乃通, 刘晓萍 . 一种新型分数阶小波变换及其应用[J]. 中国科学:信息科学, 2012,42(2): 125-135. |
SHI J , ZHANG N T , LIU X P . A novel fractional wavelet transform and its applications[J]. Scientia Sinica (Informationis), 2012,42(2): 125-135. | |
[19] | ZHOU T Y , ZHAO Y , WU J . ResNeXt and Res2Net structures for speaker verification[C]// Proceedings of 2021 IEEE Spoken Language Technology Workshop. Piscataway:IEEE Press, 2021: 301-307. |
[20] | SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520. |
[21] | QIN Z Q , ZHANG P Y , WU F ,et al. FcaNet:frequency channel attention networks[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2021: 763-772. |
[22] | HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141. |
[23] | 陈宗阳, 赵辉, 吕永胜 ,等. 基于改进 MobileNetV2 网络的涂层表面缺陷识别方法[J]. 哈尔滨工程大学学报, 2022,43(4): 572-579. |
CHEN Z Y , ZHAO H , LYU Y S ,et al. A recognition method of coating surface defects based on the improved MobileNetV2 network[J]. Journal of Harbin Engineering University, 2022,43(4): 572-579. | |
[24] | 陈亮, 邵玉斌, 龙华 ,等. 基于时域Gammatone滤波特征的广播语种识别[J]. 信号处理, 2022,38(3): 599-608. |
CHEN L , SHAO Y B , LONG H ,et al. Language identification for broadcasting signal based on time-domain gammatone filtering features[J]. Journal of Signal Processing, 2022,38(3): 599-608. | |
[25] | 曾金芳, 徐文涛, 黄费贞 . 基于耳蜗倒谱系数的说话人识别[J]. 电子技术与软件工程, 2020,5: 85-86. |
ZENG JF , XU W T , HUANG F Z . Speaker recognition based on cochlear filter cepstral coefficients[J]. Electronic Technology and Software Engineering, 2020,5: 85-86. |
[1] | 陈晋音, 熊海洋, 马浩男, 郑雅羽. 基于对比学习的图神经网络后门攻击防御方法[J]. 通信学报, 2023, 44(4): 154-166. |
[2] | 李建锋, 刘哲宇, 荣洋, 李展, 廖柏林, 屈林曦, 刘志杰, 林琨煌. 用于线性噪声时变凸二次规划的归零神经网络[J]. 通信学报, 2023, 44(4): 226-233. |
[3] | 林云, 徐怀韬, 王森, 张思成, 庄龙. 基于特征融合的通信语音干扰效果客观评估[J]. 通信学报, 2023, 44(3): 105-116. |
[4] | 杨宏宇, 杨海云, 张良, 成翔. 基于特征依赖图的源代码漏洞检测方法[J]. 通信学报, 2023, 44(1): 103-117. |
[5] | 蒋锐, 李俊, 徐友云, 王小明, 李大鹏. 基于联邦卡尔曼滤波器的容错GPS-AOA-SINS组合导航算法[J]. 通信学报, 2022, 43(8): 78-89. |
[6] | 何世文, 袁军, 安振宇, 张敏, 黄永明, 张尧学. 基于图神经网络的联合用户调度与波束成形优化算法[J]. 通信学报, 2022, 43(7): 73-84. |
[7] | 冷涛, 蔡利君, 于爱民, 朱子元, 马建刚, 李超飞, 牛瑞丞, 孟丹. 基于系统溯源图的威胁发现与取证分析综述[J]. 通信学报, 2022, 43(7): 172-188. |
[8] | 李昂, 陈建新, 魏昕, 周亮. 面向6G的跨模态信号重建技术[J]. 通信学报, 2022, 43(6): 28-40. |
[9] | 王晓丹, 李京泰, 宋亚飞. DDAC:面向卷积神经网络图像隐写分析模型的特征提取方法[J]. 通信学报, 2022, 43(5): 68-81. |
[10] | 廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203. |
[11] | 张帆, 黄赟, 方子茁, 郭威. 卷积神经网络的损失最小训练后参数量化方法[J]. 通信学报, 2022, 43(4): 114-122. |
[12] | 朱政宇, 侯庚旺, 黄崇文, 孙钢灿, 郝万明, 梁静. 基于并行CNN的RIS辅助D2D保密通信系统资源分配算法[J]. 通信学报, 2022, 43(3): 172-179. |
[13] | 霍俊彦, 王丹妮, 马彦卓, 万帅, 杨付正. 基于轻量级全连接网络的H.266/VVC分量间预测[J]. 通信学报, 2022, 43(2): 143-155. |
[14] | 朱政宇, 陈鹏飞, 王梓晅, 巩克现, 吴迪, 王忠勇. 基于Swin-Transformer的短波协议信号识别[J]. 通信学报, 2022, 43(11): 127-135. |
[15] | 熊金波, 周永洁, 毕仁万, 万良, 田有亮. 边缘协同的轻量级隐私保护分类框架[J]. 通信学报, 2022, 43(1): 127-137. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|