[1] |
HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
|
[2] |
ABADI M , AGARWAL A , BARHAM P ,et al. Tensorflow:large-scale machine learning on heterogeneous distributed systems[C]// Proceedings of the 12th USENIX Conference on Operating System Design and Implementation. Savannah:USENIX Association, 2016: 265-283.
|
[3] |
PASZKE A , GROSS S , MASSA F ,et al. PyTorch:an imperative style,highperformance deep learning library[EB]. arXiv preprint,2019,arXiv:1912.01703.
|
[4] |
JIA Y Q , SHELHAMER E , DONAHUE J ,et al. Caffe:convolutional architecture for fast feature embedding[C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York:ACM, 2014: 675-678.
|
[5] |
BRAUN S . LSTM benchmarks for deep learning frameworks[EB]. arXiv preprint,2018,arXiv:1806.01818.
|
[6] |
APPLEYARD J , KOCISKY T , BLUNSOM P ,et al. Optimizing performance of recurrent neural networks on GPUs[EB]. arXiv preprint,2016,arXiv:1604.01946.
|
[7] |
鲁蔚征, 张峰, 贺寅烜 ,等. 华为昇腾神经网络加速器性能评测与优化[J]. 计算机学报, 2022,45(8): 1618-1637.
|
|
LU W Z , ZHANG F , HE Y X ,et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022,45(8): 1618-1637.
|
[8] |
梁晓峣 . 昇腾AI处理器架构与编程:深入理解CANN技术原理及应用[M]. 北京: 清华大学出版社, 2019.
|
|
LIANG X Y . Ascend AI processor architecture and programming:principles and application of CANN[M]. Beijing: Tsinghua University Press, 2019.
|
[9] |
于璠 . 新一代深度学习框架研究[J]. 大数据, 2020,6(4): 69-80.
|
|
YU F . Research on the next-generation deep learning framework[J]. Big Data Research, 2020,6(4): 69-80.
|
[10] |
TALLADA M G . Coarse grain parallelization of deep neural networks[C]// Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM, 2016: 1-12.
|
[11] |
HOPFIELD J J . Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences of the United States of America, 1982,79(8): 2554-2558.
|
[12] |
WU Y H , SCHUSTER M , CHEN Z F ,et al. Google’s neural machine translation system:bridging the gap between human and machine translation[EB]. arXiv preprint,2016,arXiv:1609.08144.
|
[13] |
KLEIN G , KIM Y , DENG Y ,et al. OpenNMT:neural machine translation toolkit[C]// Proceedings of the 13th Conference of the Association for Machine Translation in the Americas. Boston:Association for Machine Translation in the Americas, 2018: 177-184.
|
[14] |
AMODEI D , ANUBHAI R , BATTENBERG E ,et al. Deep speech 2:end-to-end speech recognition in English and mandarin[EB]. arXiv preprint,2015,arXiv:1512.02595.
|
[15] |
WANG Y , SKERRY-RYAN R J , STANTON D ,et al. Tacotron:towards end-to-end speech synthesis[EB]. arXiv preprint,2017,arXiv:1703.10135.
|
[16] |
GüLMEZ B . Stock price prediction with optimized deep LSTM network with artificial rabbits optimization algorithm[J]. Expert Systems with Applications, 2023,227:120346.
|
[17] |
WANG H , YANG J C , CHEN G Z ,et al. Machine learning applications on air temperature prediction in the urban canopy layer:a critical review of 20112022[J]. Urban Climate, 2023,49:101499.
|
[18] |
LI B X , ZHOU E J , HUANG B ,et al. Large scale recurrent neural network on GPU[C]// Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE Press, 2014: 4062-4069.
|
[19] |
HWANG K , SUNG W . Single stream parallelization of generalized LSTM-like RNNs on a GPU[C]// Proceedings of the 2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2015: 1047-1051.
|
[20] |
BECK M , P?PPEL K , SPANRING M ,et al. xLSTM:extended long short-term memory[EB]. arXiv preprint,2024:arXiv:2405.04517.
|
[21] |
SHARMA R K , CASAS M . Wavefront parallelization of recurrent neural networks on multi-core architectures[C]// Proceedings of the 34th ACM International Conference on Supercomputing. New York:ACM, 2020: 1-12.
|
[22] |
CHEN Q F , WU J , HUANG F H ,et al. Multi-layer LSTM parallel optimization based on hardware and software cooperation[C]// Proceedings of International Conference on Knowledge Science,Engineering and Management. Cham:Springer, 2022: 681-693.
|
[23] |
WANG B C , YANG C Y , ZHU R ,et al. Analysis of performance and optimization in MindSpore on ascend NPUs[C]// Proceedings of the 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS). Piscataway:IEEE Press, 2023: 1701-1708.
|
[24] |
JIN H , WU W C , SHI X H ,et al. TurboDL:improving the CNN training on GPU with fine-grained multi-streaming scheduling[J]. IEEE Transactions on Computers, 2021,70(4): 552-565.
|
[25] |
FATICA M . CUDA toolkit and libraries[C]// Proceedings of the 2008 IEEE Hot Chips 20 Symposium (HCS). Piscataway:IEEE Press, 2008: 1-22.
|
[26] |
MAAS A L , DALY R E , PHAM P T ,et al. Learning word vectors for sentiment analysis[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Portland:The Association for Computational Linguistics, 2011: 142-150.
|