基于改进CFCC特征提取的语种识别算法研究

doi:10.11959/j.issn.1000-436x.2022234

摘要/Abstract

摘要：

针对在低信噪比下语种识别准确率低的问题，提出一种基于分数阶小波变换的语种识别算法。首先，在特征提取前端采用自适应滤波法对带噪信号进行噪声滤除，以减小噪声对特征提取的影响，提升系统对带噪信号的处理能力。其次，采用新型分数阶小波变换作为小波基函数来模拟信号在耳蜗基底膜上的传播过程，利用非线性幂函数对信号进行压缩处理。最后，通过模拟人耳听觉过程提取改进耳蜗滤波器倒谱系数（CFCC）。实验结果表明，改进CFCC与传统CFCC相比显著提升了语种识别准确率，在0 dB信噪比下语种识别准确率平均提升了11.1%，充分验证了所提算法的有效性和稳健性。

关键词: 语种识别, 自适应滤波, 分数阶小波变换, 神经网络, 耳蜗滤波器倒谱系数

Abstract:

Aiming at the problem of low language recognition rate under low signal-to-noise ratio, a language recognition method based on fractional wavelet transform was proposed.Firstly, the adaptive filtering algorithm was used to filter the noise of the noisy signal, so as to reduce the influence of noise on the feature extraction and improve the processing ability of the system for non-stationary signals.Secondly, the motion of the signal on the basilar membrane of the cochlea was simulated, and then the signal was compressed by a nonlinear power function.Finally, the improved CFCC were extracted by simulating the human hearing process.Experiments show that compared with the traditional CFCC, the language recognition rate is significantly improved, and the language recognition rate is increased by 11.1% on average under the 0 dB signal-to-noise ratio, which verifies the effectiveness and robustness of the proposed algorithm.

Key words: language recognition, adaptive filtering, fractional wavelet transform, neural network, cochlear filter cepstral coefficient

中图分类号:

TN912.34

龙华, 黄张衡, 邵玉斌, 杜庆治, 苏树盟. 基于改进CFCC特征提取的语种识别算法研究[J]. 通信学报, 2022, 43(12): 211-221.

Hua LONG, Zhangheng HUANG, Yubin SHAO, Qingzhi DU, Shumeng SU. Research on language recognition algorithm based on improved CFCC feature extraction[J]. Journal on Communications, 2022, 43(12): 211-221.

图/表 15

图1

图2

图3

图4

图5

图6

图7

图8

表1

表2

表3

表4

表5

表6

表7

参考文献 25

[1]	IRTZA S , SETHU V , AMBIKAIRAJAH E ,et al. Using language cluster models in hierarchical language identification[J]. Speech Communication, 2018,100: 30-40.
[2]	苗晓晓, 徐及, 王剑 . 基于降噪自动编码器的语种特征补偿方法[J]. 计算机研究与发展, 2019,56(5): 1082-1091.
	MIAO X X , XU J , WANG J . Denoising auto encoder-based language feature compensation[J]. Journal of Computer Research and Development, 2019,56(5): 1082-1091.
[3]	DAVIS S , MERMELSTEIN P . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1980,28(4): 357-366.
[4]	龙华, 杨明亮, 邵玉斌 . 基于特征流融合的带噪语音检测算法[J]. 通信学报, 2020,41(4): 134-142.
	LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion[J]. Journal on Communications, 2020,41(4): 134-142.
[5]	QI J , WANG D , JIANG Y ,et al. Auditory features based on Gammatone filters for robust speech recognition[C]// Proceedings of 2013 IEEE International Symposium on Circuits and Systems. Piscataway:IEEE Press, 2013: 305-308.
[6]	LI Q , HUANG Y . Robust speaker identification using an auditory-based feature[C]// Proceedings of 2010 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2010: 4514-4517.
[7]	LI Q , HUANG Y . An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,19(6): 1791-1801.
[8]	刘影, 韩康康, 钱志鸿 . 基于声音空间梯度的高稳健性击键识别方法[J]. 通信学报, 2020,41(5): 96-103.
	LIU Y , HAN K K , QIAN Z H . High-roubustness keystroke recognition method based on acoustic spatial gradient[J]. Journal on Communications, 2020,41(5): 96-103.
[9]	李晶皎, 安冬, 杨丹 ,等. 噪声环境下说话人识别的TEO-CFCC特征参数提取方法[J]. 计算机科学, 2012,39(12): 195-197.
	LI J J , AN D , YANG D ,et al. TEO-CFCC characteristic parameter extraction method for speaker recognition in noisy environments[J]. Computer Science, 2012,39(12): 195-197.
[10]	李作强, 高勇 . 基于CFCC和相位信息的鲁棒性说话人辨识[J]. 计算机工程与应用, 2015,51(17): 228-232.
	LI Z Q , GAO Y . Robust speaker identification based on CFCC and phase information[J]. Computer Engineering and Applications, 2015,51(17): 228-232.
[11]	PATEL T B , PATIL H A . Cochlear filter and instantaneous frequency based features for spoofed speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(4): 618-631.
[12]	白静, 史燕燕, 薛珮芸 ,等. 融合非线性幂函数和谱减法的 CFCC特征提取[J]. 西安电子科技大学学报, 2019,46(1): 86-92.
	BAI J , SHI Y Y , XUE P Y ,et al. CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction[J]. Journal of Xidian University, 2019,46(1): 86-92.
[13]	吴龙文, 聂雨亭, 张宇鹏 ,等. 基于变分模态分解的自适应滤波降噪方法[J]. 电子学报, 2021,49(8): 1457-1465.
	WU L W , NIE Y T , ZHANG Y P ,et al. An adaptive filtering denoising method based on variational mode decomposition[J]. Acta Electronica Sinica, 2021,49(8): 1457-1465.
[14]	GUO Y,etal . Novel fractional wavelet transform:principles,MRA and application[J]. Digital Signal Processing, 2021,110:102937.
[15]	IRINO T , PATTERSON R D . A dynamic compressive gammachirp auditory filterbank[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2006,14(6): 2222-2232.
[16]	SHAO Y , JIN Z Z , WANG D L ,et al. An auditory-based feature for robust speech recognition[C]// Proceedings of 2009 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2009: 4625-4628.
[17]	LV H , SHAN P F , SHI H F ,et al. An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement[J]. Signal,Image and Video Processing, 2022,16(8): 2231-2237.
[18]	史军, 张乃通, 刘晓萍 . 一种新型分数阶小波变换及其应用[J]. 中国科学:信息科学, 2012,42(2): 125-135.
	SHI J , ZHANG N T , LIU X P . A novel fractional wavelet transform and its applications[J]. Scientia Sinica (Informationis), 2012,42(2): 125-135.
[19]	ZHOU T Y , ZHAO Y , WU J . ResNeXt and Res2Net structures for speaker verification[C]// Proceedings of 2021 IEEE Spoken Language Technology Workshop. Piscataway:IEEE Press, 2021: 301-307.
[20]	SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520.
[21]	QIN Z Q , ZHANG P Y , WU F ,et al. FcaNet:frequency channel attention networks[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2021: 763-772.
[22]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
[23]	陈宗阳, 赵辉, 吕永胜 ,等. 基于改进 MobileNetV2 网络的涂层表面缺陷识别方法[J]. 哈尔滨工程大学学报, 2022,43(4): 572-579.
	CHEN Z Y , ZHAO H , LYU Y S ,et al. A recognition method of coating surface defects based on the improved MobileNetV2 network[J]. Journal of Harbin Engineering University, 2022,43(4): 572-579.
[24]	陈亮, 邵玉斌, 龙华 ,等. 基于时域Gammatone滤波特征的广播语种识别[J]. 信号处理, 2022,38(3): 599-608.
	CHEN L , SHAO Y B , LONG H ,et al. Language identification for broadcasting signal based on time-domain gammatone filtering features[J]. Journal of Signal Processing, 2022,38(3): 599-608.
[25]	曾金芳, 徐文涛, 黄费贞 . 基于耳蜗倒谱系数的说话人识别[J]. 电子技术与软件工程, 2020,5: 85-86.
	ZENG JF , XU W T , HUANG F Z . Speaker recognition based on cochlear filter cepstral coefficients[J]. Electronic Technology and Software Engineering, 2020,5: 85-86.

语种	标签			训练集/条
语种	标签	-5 dB	0 dB	5 dB	10 dB	15 dB
英语	0	1 400	1 400	1 400	1 400	1 400
法语	1	1 400	1 400	1 400	1 400	1 400
德语	2	1 400	1 400	1 400	1 400	1 400
意大利语	3	1 400	1 400	1 400	1 400	1 400
西班牙语	4	1 400	1 400	1 400	1 400	1 400

语种	标签	测试集/条
语种	标签	-5 dB	0 dB	5 dB	10 dB	15 dB
英语	0	600	600	600	600	600
法语	1	600	600	600	600	600
德语	2	600	600	600	600	600
意大利语	3	600	600	600	600	600
西班牙语	4	600	600	600	600	600

特征函数	听觉特性函数			识别准确率			平均识别准确率
特征函数	听觉特性函数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
CFCC	13	66.8%	70.76%	72.77%	74.16%	79.34%	72.77%
LCFCC	对数	63.73%	68.6%	74.83%	75.06%	78.7%	72.18%
CFCC0	0.101	67.73%	71.46%	75.36%	80.96%	83.46%	75.79%
CFCC1	115	65.63%	73.8%	76.86%	78.43%	80.1%	74.96%
FCFCC	0.25	68.97%	73.4%	77.5%	80.36%	84.63%	76.97%

特征参数			识别准确率			平均识别准确率
特征参数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
MFCC	59.07%	70.03%	76.96%	82.23%	84.6%	74.58%
GFCC	70.9%	72.86%	74.86%	81.5%	83.2%	76.66%
Fbank	67.06%	73.2%	77.16%	80.3%	85.56%	76.65%
CFCC	66.8%	70.76%	72.77%	74.16%	79.34%	72.77%
NFCFCC	71.3%	73.3%	79.96%	84.53%	87.67%	79.35%

特征参数			识别准确率			平均识别准确率
特征参数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
NFPSS	73.8%	77.03%	80.33%	83.77%	87.96%	80.58%
NFCFCCAF	75.5%	81.87%	83.66%	85.8%	88.42%	83.05%