[1] |
王海坤, 潘嘉, 刘聪 . 语音识别技术的研究进展与展望[J]. 电信科学, 2018,34(2): 1-11.
|
|
WANG H K , PAN J , LIU C . Research development and forecast of automatic speech recognition technologies[J]. Telecommunications Science, 2018,34(2): 1-11.
|
[2] |
CHIU C C , SAINATH T N , WU Y ,et al. State-of-the-art speech recognition with sequence-to-sequence models[C]// ICASSP 2018-2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),April 15-20,2018,Calgary,AB,Canada. Piscataway:IEEE Press, 2018.
|
[3] |
王志宏, 杨震 . 人工智能技术研究及未来智能化信息服务体系的思考[J]. 电信科学, 2017,33(5): 1-11.
|
|
WANG Z H , YANG Z . Research on artificial intelligence technology and the future intelligent information service architecture[J]. Telecommunications Science, 2017,33(5): 1-11.
|
[4] |
NODA K , YAMAGUCHI Y , NAKADAI K ,et al. Audio-visual speech recognition using deep learning[J]. Applied Intelligence, 2015,42(4): 722-737.
|
[5] |
HU D , LI X , LU X . Temporal multimodal learning in audiovisual speech recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),April 19-24,2016,Brisbane,QLD,Australia. Piscataway:IEEE Press, 2016: 3574-3582.
|
[6] |
TAKASHIMA Y , AIHARA R , TAKIGUCHI T ,et al. Audio-visual speech recognition using bimodal-trained bottleneck features for a person with severe hearing loss[J]. 2016:
|
[7] |
CHUNG J S , SENIOR A , VINYALS O ,et al. Lip reading sentences in the wild[J]. arXiv:1611.05358, 2016.
|
[8] |
PETRIDIS S , STAFYLAKIS T , MA P ,et al. End-to-end audiovisual speech recognition[C]// 2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),April 15-20,2018,Calgary,AB,Canada. Piscataway:IEEE Press, 2018: 6548-6552.
|
[9] |
WAND M , SCHMIDHUBER J , VU N T . Investigations on End-to-End Audiovisual Fusion[C]// 2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),April 15-20,2018,Calgary,AB,Canada. Piscataway:IEEE Press, 2018: 3041-3045.
|
[10] |
TAO F , BUSSO C . Gating neural network for large vocabulary audiovisual speech recognition[J]. IEEE/ACM Transactions on Audio,Speech and Language Processing (TASLP), 2018,26(7): 1286-1298.
|
[11] |
HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks[J]. Science, 2006,313(578): 504-507.
|
[12] |
GREZL F , KARAFIAT M , KONTAR S ,et al. Probabilistic and bottle-neck features for LVCSR of meetings[C]// IEEE International Conference on Acoustics,April 15-20,2007,Honolulu,HI,USA. Piscataway:IEEE Press, 2007.
|
[13] |
JI N N , ZHANG J S , ZHANG C X . A sparse-response deep belief network based on rate distortion theory[J]. Pattern Recognition, 2014,47(9): 3179-3191.
|
[14] |
LEE H , EKANADHAM C , NG A Y . Sparse deep belief net model for visual area V2[C]// Advances in Neural Information Processing Systems,Dec 3-6,2007,Vancouver,British Columbia,Canada.[S.l:s.n. ], 2007: 873-880.
|
[15] |
KIAEE F , CHRISTIAN G , ABBASI M . Alternating direction method of multipliers for sparse convolutional neural networks[J]. 2016:
|
[16] |
KHALID F , FANANY M I . Combining normal sparse into discriminative deep belief networks[C]// International Conference on Advanced Computer Science & Information Systems,Oct 15-16,2016,Malang,Indonesia. Piscataway:IEEE Press, 2016.
|
[17] |
GRAVES A , SCHMIDHUBER J . Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005,18(5-6): 602-610.
|
[18] |
LUONG M T , PHAM H , MANNING C D . Effective approaches to attention-based neural machine translation[J].,2015. arXiv preprint arXiv:1508.04025, 2015.
|
[19] |
ANINA I , ZHOU Z , ZHAO G ,et al. OuluVS2:a multi-view audiovisual database for non-rigid mouth motion analysis[C]// IEEE International Conference & Workshops on Automatic Face & Gesture Recognition,May 4-8,2015,Ljubljana,Slovenia. Piscataway:IEEE Press, 2015.
|
[20] |
HINTON G E . A practical guide to training restricted Boltzmann machines[Z].2012. 2012.
|
[21] |
GLOROT X , BENGIO Y . Understanding the difficulty of training deep feedforward neural networks[C]// The Thirteenth International Conference on Artificial Intelligence and Statistics,May 13-15,2010,Chia Laguna Resort,Sardinia,Italy.[S.l.:s.n]. 2010: 249-256.
|
[22] |
PETRIDIS S , WANG Y , LI Z ,et al. End-to-end audiovisual fusion with LSTM[J]. arXiv preprint arXiv:1709.04343, 2017.
|
[23] |
KOUMPAROULIS A , POTAMIANOS G . Deep View2View mapping for view-invariant lipreading[C]// 2018 IEEE Spoken Language Technology Workshop (SLT),Dec 18-21,2018,Athens,Greece. Piscataway:IEEE Press, 2018: 588-594.
|
[24] |
SAITOH T , ZHOU Z , ZHAO G ,et al. Concatenated frame image based CNN for visual speech recognition[C]// Asian Conference on Computer Vision,Nov 20-24,2016,Taipei,China. Berlin:Springer, 2016: 277-289.
|
[25] |
FUNG I , MAK B . End-to-end low-resource lip-reading with maxout CNN and LSTM[C]// 2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),April 15-20,2018,Calgary,AB,Canada. Piscataway:IEEE Press, 2018: 2511-2515.
|
[26] |
LEE D , LEE J , KIM K E . Multi-view automatic lip-reading using neural network[C]// Asian Conference on Computer Vision,Nov 20-24,2016,Taipei,China. Berlin:Springer, 2016: 290-302.
|
[27] |
PETRIDIS S , WANG Y , LI Z ,et al. End-to-end multi-view lipreading[J]. arXiv preprint arXiv:1709.00443, 2017.
|