[1] |
SISMAN B , YAMAGISHI J , KING S ,et al. An overview of voice conversion and its challenges:from statistical modeling to deep learning[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2021,29: 132-157.
|
[2] |
MOUCHTARIS A , AGIOMYRGIANNAKIS Y , STYLIANOU Y . Conditional vector quantization for voice conversion[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2007: 505-508.
|
[3] |
AIHARA R , TAKASHIMA R , TAKIGUCHI T ,et al. GMM-based emotional voice conversion using spectrum and prosody features[J]. American Journal of Signal Processing, 2012,2(5): 134-138.
|
[4] |
HELANDER E , SILEN H , VIRTANEN T ,et al. Voice conversion using dynamic kernel partial least squares regression[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2012,20(3): 806-817.
|
[5] |
WU Z Z , VIRTANEN T , CHNG E S ,et al. Exemplar-based sparse representation with residual compensation for voice conversion[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2014,22(10): 1506-1521.
|
[6] |
SUN L F , LI K , WANG H ,et al. Phonetic posterior grams for many-to-one voice conversion without parallel data training[C]// Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Piscataway:IEEE Press, 2016: 1-6.
|
[7] |
MURAKAMI H , HARA S , ABE M . DNN-based voice conversion with auxiliary phonemic information to improve intelligibility of glossectomy patients' speech[C]// Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Piscataway:IEEE Press, 2019: 138-142.
|
[8] |
ALAA Y , ALFONSE M , AREF M M . A survey on generative adversarial networks based models for many-to-many non-parallel voice conversion[C]// Proceedings of 5th International Conference on Computing and Informatics (ICCI). Piscataway:IEEE Press, 2022: 221-226.
|
[9] |
KANEKO T , KAMEOKA H , TANAKA K ,et al. CycleGAN-VC2:improved cyclegan-based non-parallel voice conversion[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2019: 6820-6824.
|
[10] |
KAMEOKA H , KANEKO T , TANAKA K ,et al. StarGAN-VC:non-parallel many-to-many voice conversion using star generative adversarial networks[C]// Proceedings of IEEE Spoken Language Technology Workshop (SLT). Piscataway:IEEE Press, 2018: 266-273.
|
[11] |
QIAN K Z , ZHANG Y , CHANG S Y ,et al. AUTOVC:zero-shot voice style transfer with only autoencoder loss[C]// Proceedings of 36th International Conference on Machine Learning (ICML). Piscataway:IEEE Press, 2019: 5210-5219.
|
[12] |
DENG C H , CHEN Y , DENG H F . One-shot voice conversion algorithm based on representations separation[J]. IEEE Access, 2020,8: 196578-196586.
|
[13] |
CHEN Y H , WU D Y , WU T H ,et al. AGAIN-VC:a one-shot voice conversion using activation guidance and adaptive instance normalization[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 5954-5958.
|
[14] |
WANG Q Q , ZHANG X L , WANG J Z ,et al. DRVC:a framework of any-to-any voice conversion with self-supervised learning[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 3184-3188.
|
[15] |
DANG T , TRAN D , CHIN P ,et al. Training robust zero-shot voice conversion models with self-supervised features[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 6557-6561.
|
[16] |
CHOU J C , LEE H Y . One-shot voice conversion by separating speaker and content representations with instance normalization[C]// Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH). Piscataway:IEEE Press, 2019: 664-668.
|
[17] |
WANG X , TAKAKI S , YAMAGISHI J ,et al. A vector quantized variational autoencoder (VQ-VAE) autoregressive neural F0model for statistical parametric speech synthesis[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2019,28: 157-170.
|
[18] |
WU D Y , LEE H Y . One-shot voice conversion by vector quantization[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2020: 7734-7738.
|
[19] |
YANG S D , YU X Y , ZHOU Y . LSTM and GRU neural network performance comparison study:taking yelp review dataset as an example[C]// Proceedings of International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). Piscataway:IEEE Press, 2020: 98-101.
|
[20] |
PRASAD S , MANU A , KAPOOR A ,et al. Non-parallel denoised voice conversion using vector quantisation[C]// Proceedings of 4th International Conference on Recent Trends in Computer Science and Technology (ICRTCST). Piscataway:IEEE Press, 2022: 78-83.
|
[21] |
WANG Z C , XIE Q C , LI T ,et al. One-shot voice conversion for style transfer based on speaker adaptation[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 6792-6796.
|
[22] |
KANEKO T , KAMEOKA H , TANAKA K ,et al. Maskcyclegan-VC:learning non-parallel voice conversion with filling in frames[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 5919-5923.
|
[23] |
SONG K , CONG J , WANG X S ,et al. Robust MelGAN:a robust universal neural vocoder for high-fidelity TTS[C]// Proceedings of 13th International Symposium on Chinese Spoken Language Processing (ISCSLP). Piscataway:IEEE Press, 2022: 71-75.
|
[24] |
周健, 刘荣敏, 窦云峰 ,等. 采用 L1/2稀疏约束的梅尔倒谱系数语音重建方法[J]. 声学学报, 2018,43(6): 991-999.
|
|
ZHOU J , LIU R M , DOU Y F ,et al. Speech reconstruction from Mel-frequency cepstral coefficients via L1/2sparse constraint[J]. Acta Acustica, 2018,43(6): 991-999.
|
[25] |
LEE S H , NOH H R , NAM W J ,et al. Duration controllable voice conversion via phoneme-based information bottleneck[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2022,30: 1173-1183.
|
[26] |
林云, 徐怀韬, 王森 ,等. 基于特征融合的通信语音干扰效果客观评估[J]. 通信学报, 2023,44(3): 105-116.
|
|
LIN Y , XU H T , WANG S ,et al. Objective assessment of communication speech interference effect based on feature fusion[J]. Journal on Communications, 2023,44(3): 105-116.
|
[27] |
PRIHASTO B , LIN Y X , LE P T ,et al. CNEG-VC:contrastive learning using hard negative example in non-parallel voice conversion[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2023: 1-5.
|
[28] |
SHAH N , SINGH M , TAKAHASHI N ,et al. Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing[C]// Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2023: 1-5.
|