基于多任务稀疏表达的二元麦克风小阵列话音增强算法

doi:10.3969/j.issn.1000-436x.2014.02.012

摘要/Abstract

摘要：

针对常规二元麦克风小阵列话音增强算法通常需要话音活动检测技术支持，并且难以有效抑制第一帧含目标信号的噪声。提出了一种基于多任务稀疏表达的二元麦克风小阵列话音增强算法，首先利用字典学习方法分别获得目标信号和噪声信号的过完备字典，然后利用e₂/ ₁e 混合范数对信号在其字典上的表示系数进行正则化稀疏约束，使得2个阵元接收到信号中的噪声信号被抑制，而话音信号尽量保持不变，从而达到话音增强的目标。仿真和实验数据表明，无论开始位置是否含有目标话音信号，所提出的非话音活动检测支持的二元麦克风小阵列话音增强算法均能有效实现话音增强的目标。

关键词: 麦克风小阵列, 话音增强, 字典学习, 多任务稀疏表达

Abstract:

Speech enhancement algorithms for dual small microphone arrays usually rely on the voice activity detec-tion(VAD), and they may fail in some cases when target speech signal is included in the first frame. A multi-task sparse representation based speech enhancement algorithm was proposed. First, dictionaries for signal and noise were respec-tively formed via dictionary learning. Then the noise in signals obtain from two microphones was reduced by e2/ ₁e regu-larized sparse representation on the over-complete dictionary, while the target speech signals were mostly preserved, hence the speech signals were enhanced. Experimental results from synthetic and real-world data show that the proposed speech enhancement algorithm without VAD works well in all cases no matter speech signal is included in the first frame or not.

Key words: small microphone arrays, speech enhancement, dictionary learning, multi-task sparse representation

杨立春,叶敏超,钱沄涛. 基于多任务稀疏表达的二元麦克风小阵列话音增强算法[J]. 通信学报, 2014, 35(2): 87-94.

Li-chun YANG,Min-chao YE,Yun-tao QIAN. Speech enhancement based on multi-task sparse representation for dual small microphone arrays[J]. Journal on Communications, 2014, 35(2): 87-94.

图/表 8

图1

图2

图3

图4

图5

图6

表1

表2

参考文献 25

[1]	GRIFFITHS L , JIM C . An alternative approach to linearly constrained adaptive beamforming[J]. IEEE Transactions on Antennas and Propagation, 1982,30(1): 27-34.
[2]	ELKO G W , PONG A N . A simple adaptive first-order differential microphone[A]. Proceedings of IEEE International Conference on Applications of Signal Processing to Audio and Acoustics[C]. New Paltz, NY, USA 1995. 169-172.
[3]	BRANDSTEIN M , WARD D . Microphone Arrays: Signal Processing Techniques and Applications[M]. Proceedings of IEEE International Conference on Applications of Signal Processing to Audio and Acoustics[C]. Berlin, Springer Verlag, 2001.
[4]	CHENA J , PHUA K , SHUEA L et al. Performance evaluation of adaptive dual microphone system[J]. Speech Communication, 2009,51(12): 1180-1193.
	HUANG Y , CHEN J , BENESTY J . Immersive audio schemes[J]. IEEE Signal Processing Magazine, 2011,28(1): 20-32.
[6]	ALLEN J B , BERKLEY D A , BLAUERT J . Multimicrophone signal-processing technique to remove room reverberation from speech signals[J]. The Journal of the Acoustical Society of America, 1977,62(4): 912-915.
[7]	KALLEL F , GHORBEL M , FRIKHA M et al. A noise cross PSD estimator based on improved minimum statistics method for two-microphone speech enhancement dedicated to a bilateral cochlear implant[J]. Applied Acoustics, 2012,73(3): 256-264.
[8]	ARGYRIOU A , EVGENIOU T , PONTIL M . Convex multi-task feature learning[J]. Machine Learning, 2008,73(3): 243-272.
[9]	LIU J , JI S , YE J . Multi-task feature learning via efficient l2,1-norm minimization[J]. Proceedings of the Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 2009. 339-348.
[10]	ROMERA PAREDES B , ARGYRIOU A , BIANCHI-BERTHOUZE N et al. Exploiting unrelated tasks in multi-task learning[J]. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics[C]. La Palma, Canary Islands, 2012. 951-962.
[11]	GEMMEKE J F , CRANEN B . Sparse imputation for noise robust speech recognition using soft masks[A]. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics[C]. IEEE International Conference on Acoustics, Speech and Signal Processing[C]. 2009. 4645-4648.
[12]	HE Y J , HAN J Q , DENG S W et al. A solution to residual noise in speech denoising with sparse representation[A]. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics[C]. Kyoto, Japan, 2011. 4653-4656.
[13]	SIGG C D , DIKK T , BUHMANN J M . Jordan speech enhancement using generative dictionary learning[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012,20(6): 1698-1712.
[14]	COBOS M , LOPEZ J J , SPORS S . Analysis of room reverberation effects in source localization using small microphone arrays[J]. International Symposium on Communications, Control and Signal Processing, Limassol, Cyprus, 2010 1-4.
[15]	BLANDIN C , VINCENT E , OZEROV A . Multi-source TDOA esti-mation using SNR-based angular spectra[J]. IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011. 2616-2619.
[16]	BACH F , PONCE J , SAPIRO G . Online learning for matrix factorization and sparse coding[J]. Journal of Machine Learning Research,2010, 2010,(11): 19-60.
[17]	MAIRAL J , BACH F , PONCE J et al. Online dictionary learning for sparse coding[A]. International Conference on Machine Learning, Montreal, Canada 2009. 689-696.
[18]	.
[19]	CHEN X . Accelerated gradient method for multi-task sparse learning problem[A]. International Conference on Machine Learning[C]. Miami, FL, 2009. 746-751.
[20]	HERBORDT W , KELLERMANN W . Efficient frequency-domain realization of robust generalized, sidelobe cancellers[A]. IEEE Fourth Workshop on Multimedia Signal Processing[C]. Cannes, France, 2001. 377-382.
[21]	.
[22]	BENESTY J , CHEN J , HUANG Y . Microphone Array Signal Processing[M]. Berlin: Spring-Verlag, 2008. 10-11.
[23]	MARTIN R . Noise power spectral density estimation based on optimal smoothing and minimum statistics[J]. IEEE Transactions on Speech and Audio Processing, 2001,9(5): 504-512.
[24]	MARTIN R . Bias compensation methods for minimum statistics noise power spectral density estimation[J]. Signal Processing, 2006,86(6): 1215-1229.
[25]	Wideband Extension to Rec P862 for the Assessment of Wideband Telephone Networks and Speech Codecs[R]. Intl Telecom Union, 2007.

噪声		不同方法的得分
噪声	GSC	相干滤波器	本文方法
babble	2.7	3.4	3.7
汽车	3.1	3.5	4.0
工厂	2.6	3.3	3.9
音乐	2.6	3.1	4.0
办公室	2.8	3.4	3.8

噪声		不同信号的信噪比/dB
噪声	输入信号	GSC相干滤波器	本文方法
babble	0.87	8.0111.23	11.80
汽车	7.92	18.3820.06	21.0
工厂	-8.23	-1.984.96	5.51
音乐	-3.62	3.695.15	5.46
办公室	3.19	12.3114.68	15.19

[1]	董道广,芮国胜,田文飚,康健,刘歌. 基于结构相似性的非参数贝叶斯字典学习算法[J]. 通信学报, 2019, 40(1): 43-50.
[2]	董道广, 芮国胜, 田文飚, 康健, 刘歌. 基于结构相似性的非参数贝叶斯字典学习算法[J]. 通信学报, 2018, 99(99): 1-.
[3]	汤红忠,王翔,张小刚,李骁,毛丽珍. 面向单幅图像去雨的非相干字典学习及其稀疏表示研究[J]. 通信学报, 2017, 38(7): 28-35.
[4]	杨立春1,2，叶敏超1，钱沄涛1. 基于多任务稀疏表达的二元麦克风小阵列语音增强算法[J]. 通信学报, 2014, 35(2): 12-94.
[5]	王婷婷，柯炜，孙超. 自适应环境变化的RSS室内定位方法[J]. 通信学报, 2014, 35(10): 24-217.
[6]	王婷婷,柯炜,孙超. 自适应环境变化的RSS室内定位方法[J]. 通信学报, 2014, 35(10): 210-217.
[7]	周璇,鲍长春,夏丙寅. 融合统计模型与EMD的宽带话音增强方法[J]. 通信学报, 2013, 34(8): 95-101.
[8]	周璇，鲍长春，夏丙寅. 融合统计模型与EMD的宽带话音增强方法[J]. 通信学报, 2013, 34(8): 13-101.
[9]	肖玲1,2，李仁发1，曾凡仔2，屈卫兰2. 基于自学习稀疏表示的动态手势识别方法[J]. 通信学报, 2013, 34(6): 16-135.
[10]	肖玲,李仁发,曾凡仔,屈卫兰. 基于自学习稀疏表示的动态手势识别方法[J]. 通信学报, 2013, 34(6): 128-135.