高阶最优LPC根值筛选的共振峰估计算法研究

doi:10.11959/j.issn.1000-436x.2022113

Abstract

Abstract:

Objectives: The existing linear prediction (LP) formant estimation algorithms are difficult to locate formant precisely because of the pseudo root interference and interaction between poles.Because of the low order fitting formant of LP prediction,the accuracy of formant extraction is fundamentally limited.It is difficult to remove false roots and spectrum aliasing caused by pole interaction in the formant extraction of high-order LP.In order to solve the problem of large error of LP formant detection,a formant estimation algorithm based on high-order LP coefficient root value screening was proposed. The root determination threshold, optimal LP root value distribution, peak distribution of formant in spectral envelope and formant estimation error of speech digital resonance model constraints under different orders are investigated.

Methods: The value of LP order is increased to improve the fitting degree of LP system spectrum of speech signal.The calculation precision of formant frequency of speech signal is analyzed in different order,and the root value of linear system with higher linear peak fitting precision is obtained.A speech digital resonance model is used to constrain the root amplitude range of the formant, and the number of false roots is reduced by matching the root amplitude of the order to filter the root values of the linear system.Combined with power weighting,the main spectral components of the signal are weighted. So the amplitude of speech frequency is corrected, and the energy matching between the spectral peak of the speech signal and the spectral peak of LPC is enhanced, the distance between poles is extended, the prediction error caused by harmonic generation interference is reduced, and the peak frequency discrimination of spectrum is improved.

Results: As can be seen from the algorithm structure, the speech signal is preprocessed, in which the low frequency information is reweighted to reduce the interference of fundamental frequency to formant detection.And the high frequency information is enhanced to increase the amplitude distinction of the third formant in the high spectrum line. And the end detection is isolated to do the high-order LP analysis of the spoken frame under the constraint of digital resonance model. The model includes three main techniques which improving the performance:(1) Within the system tolerance range, LP order is increased, which can improve the formant prediction accuracy. The formant is the peak frequency of the spectral envelope, which corresponding to the zero-pole of the LP polynomial. The 9-order linear prediction only preserves the basic shape of LP response amplitude spectrum of speech signal.When the order of LP is increased to the 15,the fitting degree of the signal is increased,and the zero and pole of LP is dense and the distribution of LP is closer to the unit circle.The 15th order LP compensates for the sacrifice of formant fitting accuracy caused by the 9th order linear fitting, which improves the formant extraction accuracy by 2.5%. (2) Using the threshold value under the constraint of digital resonance root value to determine the complex roots,the low frequency false roots generated by fundamental frequency harmonics and the false roots generated by formant harmonics is effectively filtered.The zeroes-poles of the LP polynomial are the complex roots corresponding to the formant peaks.In the view of the distribution of formant detection root values, the high-order LP root threshold constrained by digital formant root values can effectively filter the false roots generated by harmonic action of sound channel. And accurately the location of the root corresponding to formant root values in the unit circle is accurately located. (3) The revised signal prediction formant is more accurate by reweighting the speech frequency power.The spectrum envelope energy is more concentrated after power weighting.At order 18, the aliasing interference caused by the peak frequency of the formant at 1363Hz to 1359Hz is eliminated. In terms of the robustness of the algorithm and the overall performance comparison of different methods,the proposed algorithm can extract the formant robustly from order 9 to 22, and the model algorithm shows the optimal performance when the formant is extracted from order 18.

Conclusions:The method of formant detection based on LPC is improved.The effect of improving the order of linear prediction on formant extraction was studied.Aiming at the problem of multiple pseudo-roots and multi-pole interaction caused by increasing the order of linear prediction, the error of formant extraction constrained by the speech-digital resonance model is minimized. The relationship between the order of linear prediction and the screening threshold of root amplitude was analyzed. To remove false roots, the root amplitude feedback method under digital resonance constraint was used to obtain the filtering threshold of matching high order and low error rate. Combined with the power weighting, amplitude of the peak of the prominent spectrum is strengthened,which eliminates the pole interaction in formant extraction,achieving accurate and effective formant extraction.

Key words: linear prediction, digital resonance, power weighting, formant

CLC Number:

TN912.32

Hua LONG, Shumeng SU. Research on formant estimation algorithm for high order optimal LPC root value screening[J]. Journal on Communications, 2022, 43(6): 235-245.

Figures/Tables 12

References 21

[1]	VANITHA L M , SUDHA S . Noise diminution and formant extraction on vowels for hearing aid users[J]. Multimedia Tools and Applications, 2020,79(5/6): 3729-3741.
[2]	LIU Z T , REHMAN A , WU M ,et al. Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence[J]. Information Sciences, 2021,563: 309-325.
[3]	曹冲, 解焱陆, 张劲松 . 不同共振峰分布下元音对声调感知的影响[J]. 清华大学学报(自然科学版), 2018,58(4): 352-356.
	CAO C , XIE Y L , ZHANG J S . Influence on tone perception from vowels with different formant distributions[J]. Journal of Tsinghua University (Science and Technology), 2018,58(4): 352-356.
[4]	MCCANDLESS S . An algorithm for automatic formant extraction using linear prediction spectra[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1974,22(2): 135-141.
[5]	黄海, 陈祥献 . 基于 Hilbert-Huang 变换的语音信号共振峰频率估计[J]. 浙江大学学报(工学版), 2006,40(11): 1926-1930.
	HUANG H , CHENG X X . Speech formant frequency estimation based on Hilbert-Huang transform[J]. Journal of Zhejiang University, 2006,40(11): 1926-1930.
[6]	DISSEN Y , GOLDBERGER J , KESHET J . Formant estimation and tracking:a deep learning approach[J]. The Journal of the Acoustical Society of America, 2019,145(2): 642-653.
[7]	赵涛涛, 杨鸿武 . 结合EMD和加权Mel倒谱的语音共振峰提取算法[J]. 计算机工程与应用, 2015,51(9): 207-212.
	ZHAO T T , YANG H W . Formant extraction algorithm of speech signal by combining EMD and WMCEP[J]. Computer Engineering and Applications, 2015,51(9): 207-212.
[8]	RABINER L R , SCHAFER R W . 数字语音处理理论与应用[M].刘加,张卫强,何亮,译北京: 电子工业出版社, 2011.
	RABINER L R , SCHAFER R W . Theory and applications of digital speech processing[M]. Translated by LIU J,ZHANG W Q,HE L. Beijing: Publishing House of Electronics Industry, 2011.
[9]	TREMAIN E T . The government standard linear predictive coding algorithm:LPC10[J]. Speech Technol, 1982,1(1): 40-49.
[10]	YAN Z Y , ZHAO H M . Formant estimation algorithm based on digital waveguide models[C]// Proceedings of 2010 2nd International Conference on Information Engineering and Computer Science. Piscataway:IEEE Press, 2010: 1-4.
[11]	MESSAOUD Z B , HAMIDA A B . Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition[J]. International Journal of Speech Technology, 2011,14(4): 393-403.
[12]	MAGI C , POHJALAINEN J , B?CKSTR?M T , ,et al. Stabilised weighted linear prediction[J]. Speech Communication, 2009,51(5): 401-411.
[13]	KERONEN S , POHJALAINEN J , ALKU P ,et al. Noise robust feature extraction based on extended weighted linear prediction in LVCSR[C]// Proceedings of the 12th Annual Conference of the International Speech Communication Association. Saarland:DBLP, 2011: 1-5.
[14]	FRéIN R D . Power-weighted LPC formant estimation[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2021,68(6): 2207-2211.
[15]	SUDHARSHAN R , RAMALINGAM C S . A data-driven weighted LP method for formant estimation[C]// Proceedings of 2020 IEEE 4th Conference on Information ＆ Communication Technology. Piscataway:IEEE Press, 2020: 1-6.
[16]	XU L , LIU H J , ZHANG S L ,et al. Speech feature extraction based on linear prediction residual[C]// Proceedings of 2020 IEEE 5th International Conference on Signal and Image Processing. Piscataway:IEEE Press, 2020: 768-772.
[17]	DIGGLE P J , WHITTLE P . Prediction and regulation by linear least-square methods[J]. Biometrics, 1984,40(3): 871-877.
[18]	YOKOTA K , ISHIKAWA S , KOBA Y ,et al. Inverse analysis of vocal sound source using an analytical model of the vocal tract[J]. Applied Acoustics, 2019,150(7): 89-103.
[19]	XU K N , HU W , WANG Y H . An improved singer’s formant extraction method based on LPC algorithm[C]// Proceedings of 2017 10th International Congress on Image and Signal Processing,BioMedical Engineering and Informatics (CISP-BMEI). Piscataway:IEEE Press, 2017: 1-5.
[20]	ZAPATA J L G , DíAZ MARTíN J C , VILDA P G . Fast formant estimation by complex analysis of LPC coefficients[C]// Proceedings of 2004 12th European Signal Processing Conference. Piscataway:IEEE Press, 2004: 737-740.
[21]	ZHANG H J , YANG Y . Fundamental frequency adjustment and formant transition based emotional speech synthesis[C]// Proceedings of 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway:IEEE Press, 2012: 1797-1801.

Metrics

Recommended 0

No Suggested Reading articles found!

共振峰	F /Hz	TLP-FE	PWLP-FE	HOLP-FE	PWLP-FE+HOLP-FE
F?1	906	-24.6%	-25.1%	-24.6%	-25.1%
F? 2	1 359	-5.0%	-6.1%	-5.0%	-6.1%
F? 3	2 609	-0.4%	1.2%	-0.4%	1.2%
F?	4 874	-6.2%	-5.6%	-6.2%	-5.6%

共振峰	F /Hz	TLP-FE	PWLP-FE	HOLP-FE	PWLP-FE+HOLP-FE
F?1	906	-63.3%	-62.9%	-11.8%	-7.4%
F? 2	1 359	-41.2%	-38.2%	-2.6%	-0.1%
F? 3	2 609	-49.2%	-47.8%	-4.3%	-3.2%
F?	4 874	-49.6%	-47.9%	-5.2%	-3.1%

共振峰	F /Hz	TLP-FE	PWLP-FE	HOLP-FE	PWLP-FE+HOLP-FE
F?1	906	-64.6%	-59.3%	-12.9%	-8.8%
F? 2	1 359	-41.9%	-39.2%	-2.2%	0
F? 3	2 609	-49.1%	-47.9%	-4.0%	-4.2%
F?	4 874	-50.0%	-47.6%	-5.1%	-3.9%

Research on formant estimation algorithm for high order optimal LPC root value screening

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 21

Related Articles 6

Metrics

Recommended 0

[1]	. Noise-robust linear prediction analysis of speech based on super-Gaussian excitation [J]. Journal on Communications, 2013, 34(5): 7-70.
[2]	Bin ZHOU,Xia ZOU,Xiong-wei ZHANG,Gai-hua ZHAO. Noise-robust linear prediction analysis of speech based on super-Gaussian excitation [J]. Journal on Communications, 2013, 34(5): 52-61.
[3]	Ke-qin ZHOU,Rong-fang SONG,Xue-yun HE. Semi-blind channel estimation method for MIMO NC-OFDM systems in cognitive radio context [J]. Journal on Communications, 2011, 32(11A): 9-16.
[4]	Rui FAN,Chang-chun BAO,Rui LI. Embedded speech coding algorithm based on ACELP [J]. Journal on Communications, 2007, 28(10): 48-54.
[5]	Xue-min RU,Yue-ting ZHUANG,Fei WU. Audio steganalysis based on auto-correlation property of sdteganographic tools [J]. Journal on Communications, 2006, 27(4): 101-106.
[6]	Chong-ying QI,Yong-shun ZHANG,Xi-hong CHEN,Ying HAN. Algorithm on high resolution DOA estimation under condition of unknown number of signal sources [J]. Journal on Communications, 2005, 26(3): 58-63.