[1] |
ATAL B S . Automatic recognition of speakers from their voices[J]. Proceedings of the IEEE, 1976,64(4): 460-475.
|
[2] |
杨震, 王婷婷 . 语音图信号处理理论与技术研究[J]. 南京邮电大学学报(自然科学版), 2020,40(5): 43-51.
|
|
YANG Z , WANG T T . Research on speech graph signal processing theory and technology[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science), 2020,40(5): 43-51.
|
[3] |
JUNG J W , HEO H S , YU H J ,et al. Graph attention networks for speaker verification[C]// Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 6149-6153.
|
[4] |
SHIM H J , HEO J , PARK J H ,et al. Graph attentive feature aggregation for text-independent speaker verification[C]// Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 7972-7976.
|
[5] |
LIU B , CHEN Z Y , QIAN Y M . Attentive feature fusion for robust speaker verification[C]// Proceedings of Interspeech 2022. New York:ACM Press, 2022: 286-290.
|
[6] |
SANKALA S , RAFI B S M , K S R M . Multi-feature integration for speaker embedding extraction[C]// Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 7957-7961.
|
[7] |
林云, 徐怀韬, 王森 ,等. 基于特征融合的通信语音干扰效果客观评估[J]. 通信学报, 2023,44(3): 105-116.
|
|
LIN Y , XU H T , WANG S ,et al. Objective assessment of communication speech interference effect based on feature fusion[J]. Journal on Communications, 2023,44(3): 105-116.
|
[8] |
郑金志, 汲如意, 张立波 ,等. 基于Transformer解码的端到端场景文本检测与识别算法[J]. 通信学报, 2023,44(5): 64-78.
|
|
ZHENG J Z , JI R Y , ZHANG L B ,et al. End-to-end scene text detection and recognition algorithm based on Transformer decoders[J]. Journal on Communications, 2023,44(5): 64-78.
|
[9] |
秦志金, 赵菼菼, 李凡 ,等. 多模态语义通信研究综述[J]. 通信学报, 2023,44(5): 28-41.
|
|
QIN Z J , ZHAO T T , LI F ,et al. Survey of research on multimodal semantic communication[J]. Journal on Communications, 2023,44(5): 28-41.
|
[10] |
ORTEGA A , FROSSARD P , KOVA?EVI? J , ,et al. Graph signal processing:overview,challenges,and applications[J]. Proceedings of the IEEE, 2018,106(5): 808-828.
|
[11] |
YAN X , YANG Z , WANG T ,et al. An iterative graph spectral subtraction method for speech enhancement[J]. Speech Communication, 2020,123: 35-42.
|
[12] |
WANG T T , GUO H Y , YAN X ,et al. Speech signal processing on graphs:the graph frequency analysis and an improved graph Wiener filtering method[J]. Speech Communication, 2021,127: 82-91.
|
[13] |
WANG T T , GUO H Y , ZHANG Q Q ,et al. A new multilayer graph model for speech signals with graph learning[J]. Digital Signal Processing, 2022:doi.org/10.1016/j.dsp.2021.103360.
|
[14] |
WANG T T , PAN Z X , GE M ,et al. Time-domain speech separation networks with graph encoding auxiliary[J]. IEEE Signal Processing Letters, 2023,30: 110-114.
|
[15] |
ZHANG C H , PAN X . Single-channel speech enhancement using graph Fourier transform[C]// Proceedings of Interspeech 2022. New York:ACM Press, 2022: 946-950.
|
[16] |
HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778.
|
[17] |
HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
|
[18] |
DESPLANQUES B , THIENPONDT J , DEMUYNCK K . ECAPA-TDNN:emphasized channel attention,propagation and aggregation in TDNN based speaker verification[C]// Proceedings of Interspeech 2020. New York:ACM Press, 2020: 3830-3834.
|
[19] |
NAGRANI A , CHUNG J S , ZISSERMAN A . VoxCeleb:a large-scale speaker identification dataset[J]. arXiv Preprint,arXiv:1706.08612, 2017.
|
[20] |
CHUNG J S , NAGRANI A , ZISSERMAN A . VoxCeleb2:deep speaker recognition[J]. arXiv Preprint,arXiv:1806.05622, 2018.
|
[21] |
MCLAREN M , FERRER L , CASTAN D ,et al. The speakers in the wild (SITW) speaker recognition database[C]// Proceedings of Interspeech 2016. New York:ACM Press, 2016: 818-822.
|
[22] |
FAN Y , KANG J W , LI L T ,et al. CN-celeb:a challenging Chinese speaker recognition dataset[C]// Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2020: 7604-7608.
|
[23] |
ZEINALI H , WANG S , SILNOVA A ,et al. BUT system description to VoxCeleb speaker recognition challenge 2019[J]. arXiv Preprint,arXiv:1910.12592, 2019.
|
[24] |
SAFARI P , INDIA M , HERNANDO J . Self-attention encoding and pooling for speaker recognition[C]// Proceedings of Interspeech 2020. New York:ACM Press, 2020: 941-945.
|
[25] |
HAN B , CHEN Z Y , QIAN Y M . Local information modeling with self-attention for speaker verification[C]// Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 6727-6731.
|
[26] |
SNYDER D , CHEN G , POVEY D . MUSAN:a music,speech,and noise corpus[J]. arXiv Preprint,arXiv:1510.08484, 2015.
|
[27] |
KO T , PEDDINTI V , POVEY D ,et al. A study on data augmentation of reverberant speech for robust speech recognition[C]// Proceedings of 2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2017: 5220-5224.
|
[28] |
PARK D S , CHAN W , ZHANG Y ,et al. SpecAugment:a simple data augmentation method for automatic speech recognition[C]// Proceedings of Interspeech 2019. New York:ACM Press, 2019: 2613-2617.
|
[29] |
DENG J , GUO J , YANG J ,et al. ArcFace:additive angular margin loss for deep face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,44(10): 5962-5979.
|
[30] |
YADAV S , RAI A . Frequency and temporal convolutional attention for text-independent speaker recognition[C]// Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2020: 6794-6798.
|
[31] |
HAN B , CHEN Z Y , LIU B ,et al. MLP-SVNET:a multi-layer perceptrons based network for speaker verification[C]// Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 7522-7526.
|
[32] |
MAATEN L V D , HINTON G . Visualizing data using t-SNE[J]. Journal of machine learning research, 2008,9(11): 2579-2605.
|