基于改进CFCC特征提取的语种识别算法研究

doi:10.11959/j.issn.1000-436x.2022234

Abstract

Abstract:

Aiming at the problem of low language recognition rate under low signal-to-noise ratio, a language recognition method based on fractional wavelet transform was proposed.Firstly, the adaptive filtering algorithm was used to filter the noise of the noisy signal, so as to reduce the influence of noise on the feature extraction and improve the processing ability of the system for non-stationary signals.Secondly, the motion of the signal on the basilar membrane of the cochlea was simulated, and then the signal was compressed by a nonlinear power function.Finally, the improved CFCC were extracted by simulating the human hearing process.Experiments show that compared with the traditional CFCC, the language recognition rate is significantly improved, and the language recognition rate is increased by 11.1% on average under the 0 dB signal-to-noise ratio, which verifies the effectiveness and robustness of the proposed algorithm.

Key words: language recognition, adaptive filtering, fractional wavelet transform, neural network, cochlear filter cepstral coefficient

CLC Number:

TN912.34

Hua LONG, Zhangheng HUANG, Yubin SHAO, Qingzhi DU, Shumeng SU. Research on language recognition algorithm based on improved CFCC feature extraction[J]. Journal on Communications, 2022, 43(12): 211-221.

Figures/Tables 15

References 25

[1]	IRTZA S , SETHU V , AMBIKAIRAJAH E ,et al. Using language cluster models in hierarchical language identification[J]. Speech Communication, 2018,100: 30-40.
[2]	苗晓晓, 徐及, 王剑 . 基于降噪自动编码器的语种特征补偿方法[J]. 计算机研究与发展, 2019,56(5): 1082-1091.
	MIAO X X , XU J , WANG J . Denoising auto encoder-based language feature compensation[J]. Journal of Computer Research and Development, 2019,56(5): 1082-1091.
[3]	DAVIS S , MERMELSTEIN P . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1980,28(4): 357-366.
[4]	龙华, 杨明亮, 邵玉斌 . 基于特征流融合的带噪语音检测算法[J]. 通信学报, 2020,41(4): 134-142.
	LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion[J]. Journal on Communications, 2020,41(4): 134-142.
[5]	QI J , WANG D , JIANG Y ,et al. Auditory features based on Gammatone filters for robust speech recognition[C]// Proceedings of 2013 IEEE International Symposium on Circuits and Systems. Piscataway:IEEE Press, 2013: 305-308.
[6]	LI Q , HUANG Y . Robust speaker identification using an auditory-based feature[C]// Proceedings of 2010 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2010: 4514-4517.
[7]	LI Q , HUANG Y . An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,19(6): 1791-1801.
[8]	刘影, 韩康康, 钱志鸿 . 基于声音空间梯度的高稳健性击键识别方法[J]. 通信学报, 2020,41(5): 96-103.
	LIU Y , HAN K K , QIAN Z H . High-roubustness keystroke recognition method based on acoustic spatial gradient[J]. Journal on Communications, 2020,41(5): 96-103.
[9]	李晶皎, 安冬, 杨丹 ,等. 噪声环境下说话人识别的TEO-CFCC特征参数提取方法[J]. 计算机科学, 2012,39(12): 195-197.
	LI J J , AN D , YANG D ,et al. TEO-CFCC characteristic parameter extraction method for speaker recognition in noisy environments[J]. Computer Science, 2012,39(12): 195-197.
[10]	李作强, 高勇 . 基于CFCC和相位信息的鲁棒性说话人辨识[J]. 计算机工程与应用, 2015,51(17): 228-232.
	LI Z Q , GAO Y . Robust speaker identification based on CFCC and phase information[J]. Computer Engineering and Applications, 2015,51(17): 228-232.
[11]	PATEL T B , PATIL H A . Cochlear filter and instantaneous frequency based features for spoofed speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(4): 618-631.
[12]	白静, 史燕燕, 薛珮芸 ,等. 融合非线性幂函数和谱减法的 CFCC特征提取[J]. 西安电子科技大学学报, 2019,46(1): 86-92.
	BAI J , SHI Y Y , XUE P Y ,et al. CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction[J]. Journal of Xidian University, 2019,46(1): 86-92.
[13]	吴龙文, 聂雨亭, 张宇鹏 ,等. 基于变分模态分解的自适应滤波降噪方法[J]. 电子学报, 2021,49(8): 1457-1465.
	WU L W , NIE Y T , ZHANG Y P ,et al. An adaptive filtering denoising method based on variational mode decomposition[J]. Acta Electronica Sinica, 2021,49(8): 1457-1465.
[14]	GUO Y,etal . Novel fractional wavelet transform:principles,MRA and application[J]. Digital Signal Processing, 2021,110:102937.
[15]	IRINO T , PATTERSON R D . A dynamic compressive gammachirp auditory filterbank[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2006,14(6): 2222-2232.
[16]	SHAO Y , JIN Z Z , WANG D L ,et al. An auditory-based feature for robust speech recognition[C]// Proceedings of 2009 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2009: 4625-4628.
[17]	LV H , SHAN P F , SHI H F ,et al. An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement[J]. Signal,Image and Video Processing, 2022,16(8): 2231-2237.
[18]	史军, 张乃通, 刘晓萍 . 一种新型分数阶小波变换及其应用[J]. 中国科学:信息科学, 2012,42(2): 125-135.
	SHI J , ZHANG N T , LIU X P . A novel fractional wavelet transform and its applications[J]. Scientia Sinica (Informationis), 2012,42(2): 125-135.
[19]	ZHOU T Y , ZHAO Y , WU J . ResNeXt and Res2Net structures for speaker verification[C]// Proceedings of 2021 IEEE Spoken Language Technology Workshop. Piscataway:IEEE Press, 2021: 301-307.
[20]	SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520.
[21]	QIN Z Q , ZHANG P Y , WU F ,et al. FcaNet:frequency channel attention networks[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2021: 763-772.
[22]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
[23]	陈宗阳, 赵辉, 吕永胜 ,等. 基于改进 MobileNetV2 网络的涂层表面缺陷识别方法[J]. 哈尔滨工程大学学报, 2022,43(4): 572-579.
	CHEN Z Y , ZHAO H , LYU Y S ,et al. A recognition method of coating surface defects based on the improved MobileNetV2 network[J]. Journal of Harbin Engineering University, 2022,43(4): 572-579.
[24]	陈亮, 邵玉斌, 龙华 ,等. 基于时域Gammatone滤波特征的广播语种识别[J]. 信号处理, 2022,38(3): 599-608.
	CHEN L , SHAO Y B , LONG H ,et al. Language identification for broadcasting signal based on time-domain gammatone filtering features[J]. Journal of Signal Processing, 2022,38(3): 599-608.
[25]	曾金芳, 徐文涛, 黄费贞 . 基于耳蜗倒谱系数的说话人识别[J]. 电子技术与软件工程, 2020,5: 85-86.
	ZENG JF , XU W T , HUANG F Z . Speaker recognition based on cochlear filter cepstral coefficients[J]. Electronic Technology and Software Engineering, 2020,5: 85-86.

Metrics

Recommended 0

No Suggested Reading articles found!

语种	标签			训练集/条
语种	标签	-5 dB	0 dB	5 dB	10 dB	15 dB
英语	0	1 400	1 400	1 400	1 400	1 400
法语	1	1 400	1 400	1 400	1 400	1 400
德语	2	1 400	1 400	1 400	1 400	1 400
意大利语	3	1 400	1 400	1 400	1 400	1 400
西班牙语	4	1 400	1 400	1 400	1 400	1 400

语种	标签	测试集/条
语种	标签	-5 dB	0 dB	5 dB	10 dB	15 dB
英语	0	600	600	600	600	600
法语	1	600	600	600	600	600
德语	2	600	600	600	600	600
意大利语	3	600	600	600	600	600
西班牙语	4	600	600	600	600	600

特征函数	听觉特性函数			识别准确率			平均识别准确率
特征函数	听觉特性函数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
CFCC	13	66.8%	70.76%	72.77%	74.16%	79.34%	72.77%
LCFCC	对数	63.73%	68.6%	74.83%	75.06%	78.7%	72.18%
CFCC0	0.101	67.73%	71.46%	75.36%	80.96%	83.46%	75.79%
CFCC1	115	65.63%	73.8%	76.86%	78.43%	80.1%	74.96%
FCFCC	0.25	68.97%	73.4%	77.5%	80.36%	84.63%	76.97%

特征参数			识别准确率			平均识别准确率
特征参数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
MFCC	59.07%	70.03%	76.96%	82.23%	84.6%	74.58%
GFCC	70.9%	72.86%	74.86%	81.5%	83.2%	76.66%
Fbank	67.06%	73.2%	77.16%	80.3%	85.56%	76.65%
CFCC	66.8%	70.76%	72.77%	74.16%	79.34%	72.77%
NFCFCC	71.3%	73.3%	79.96%	84.53%	87.67%	79.35%

特征参数			识别准确率			平均识别准确率
特征参数	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
NFPSS	73.8%	77.03%	80.33%	83.77%	87.96%	80.58%
NFCFCCAF	75.5%	81.87%	83.66%	85.8%	88.42%	83.05%

Research on language recognition algorithm based on improved CFCC feature extraction

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 25

Related Articles 15

Metrics

Recommended 0

特征	分类网络			识别准确率			平均识别准确率
特征	分类网络	-5 dB	0 dB	5 dB	10 dB	15 dB	平均识别准确率
NFCFCCAF	FcaNet-MobileNetV2	75.5%	81.87%	83.66%	85.8%	88.42%	83.05%
	MobileNetV2	73.62%	79.7%	82.33%	84.37%	85.2%	81.04%
	ResNet	74.2%	80.36%	81.9%	84.5%	85.63%	81.30%
NFCFCCAF-DS	FcaNet-MobileNetV2	81.2%	82.97%	85.93%	87.8%	90.37%	85.65%
	MobileNetV2	79.5%	80.63%	83.82%	85.97%	88.1%	83.60%
	ResNet	77.16%	80.2%	81.2%	84.54%	88.25%	82.27%

[1]	Jinyin CHEN, Haiyang XIONG, Haonan MA, Yayu ZHENG. CLB-Defense: based on contrastive learning defense for graph neural network against backdoor attack [J]. Journal on Communications, 2023, 44(4): 154-166.
[2]	Jianfeng LI, Zheyu LIU, Yang RONG, Zhan LI, Bolin LIAO, Linxi QU, Zhijie LIU, Kunhuang LIN. Zeroing neural network for time-varying convex quadratic programming with linear noise [J]. Journal on Communications, 2023, 44(4): 226-233.
[3]	Yun LIN, Huaitao XU, Sen WANG, Sicheng ZHANG, Long ZHUANG. Objective assessment of communication speech interference effect based on feature fusion [J]. Journal on Communications, 2023, 44(3): 105-116.
[4]	Hongyu YANG, Haiyun YANG, Liang ZHANG, Xiang CHENG. Feature dependence graph based source code loophole detection method [J]. Journal on Communications, 2023, 44(1): 103-117.
[5]	Rui JIANG, Jun LI, Youyun XU, Xiaoming WANG, Dapeng LI. Fault tolerant GPS-AOA-SINS integrated navigation algorithm based on federated Kalman filter [J]. Journal on Communications, 2022, 43(8): 78-89.
[6]	Shiwen HE, Jun YUAN, Zhenyu AN, Min ZHANG, Yongming HUANG, Yaoxue ZHANG. GNN-based optimization algorithm for joint user scheduling and beamforming [J]. Journal on Communications, 2022, 43(7): 73-84.
[7]	Tao LENG, Lijun CAI, Aimin YU, Ziyuan ZHU, Jian’gang MA, Chaofei LI, Ruicheng NIU, Dan MENG. Review of threat discovery and forensic analysis based on system provenance graph [J]. Journal on Communications, 2022, 43(7): 172-188.
[8]	Yurong LIAO, Haining WANG, Cunbao LIN, Yang LI, Yuqiang FANG, Shuyan NI. Research progress of deep learning-based object detection of optical remote sensing image [J]. Journal on Communications, 2022, 43(5): 190-203.
[9]	Fan ZHANG, Yun HUANG, Zizhuo FANG, Wei GUO. Lost-minimum post-training parameter quantization method for convolutional neural network [J]. Journal on Communications, 2022, 43(4): 114-122.
[10]	Zhengyu ZHU, Gengwang HOU, Chongwen HUANG, Gangcan SUN, Wanming HAO, Jing LIANG. Systems resource allocation algorithm for RIS-assisted D2D secure communication based on parallel CNN [J]. Journal on Communications, 2022, 43(3): 172-179.
[11]	Junyan HUO, Danni WANG, Yanzhuo MA, Shuai WAN, Fuzheng YANG. Efficient cross-component prediction for H.266/VVC based on lightweight fully connected networks [J]. Journal on Communications, 2022, 43(2): 143-155.
[12]	Zhengyu ZHU, Pengfei CHEN, Zixuan WANG, Kexian GONG, Di WU, Zhongyong WANG. Short wave protocol signals recognition based on Swin-Transformer [J]. Journal on Communications, 2022, 43(11): 127-135.
[13]	Jinbo XIONG, Yongjie ZHOU, Renwan BI, Liang WAN, Youliang TIAN. Towards edge-collaborative, lightweight and privacy-preserving classification framework [J]. Journal on Communications, 2022, 43(1): 127-137.
[14]	Yiteng WU, Wei LIU, Hongtao YU. Label flipping adversarial attack on graph neural network [J]. Journal on Communications, 2021, 42(9): 65-74.
[15]	Changyin SUN, Liyan LIU, Fan JIANG, Jing JIANG. DNN-based Sub-6 GHz assisted millimeter wave network power allocation algorithm [J]. Journal on Communications, 2021, 42(9): 184-193.

语种			识别准确率
语种	-5 dB	0 dB	5 dB	10 dB	15 dB
法语	71.5%	78.5%	80.5%	82.12%	86.16%
意大利语	70.5%	79.56%	83.16%	84.2%	85.16%
西班牙语	74.16%	82.4%	83.5%	85.16%	87.26%
德语	78.83%	81.56%	82.64%	87%	89.65%
英语	82.5%	87.33%	88.5%	90.5%	91.5%