基于FPGA的递归神经网络加速器的研究进展

doi:10.11959/j.issn.2096-109x.2019034

Abstract

Abstract:

Recurrent neural network(RNN) has been used wildly used in machine learning field in recent years,especially in dealing with sequential learning tasks compared with other neural network like CNN.However,RNN and its variants,such as LSTM,GRU and other fully connected networks,have high computational and storage complexity,which makes its inference calculation slow and difficult to be applied in products.On the one hand,traditional computing platforms such as CPU are not suitable for large-scale matrix operation of RNN.On the other hand,the shared memory and global memory of hardware acceleration platform GPU make the power consumption of GPU-based RNN accelerator higher.More and more research has been done on the RNN accelerator of the FPGA in recent years because of its parallel computing and low power consumption performance.An overview of the researches on RNN accelerator based on FPGA in recent years is given.The optimization algorithm of software level and the architecture design of hardware level used in these accelerator are summarized and some future research directions are proposed.

Key words: recurrent neural network, FPGA, accelerator

CLC Number:

TP391.1

Chen GAO,Fan ZHANG. Survey of FPGA based recurrent neural network accelerator[J]. Chinese Journal of Network and Information Security, 2019, 5(4): 1-13.

Figures/Tables 10

References 47

[1]	HAO Y , QUIGLEY S . The implementation of a deep recurrent neural network language model on a Xilinx FPGA[J]. arXiv Preprint arXiv:1710.10296, 2017.
[2]	SAK H , SENIOR A , BEAUFAYS F . Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition[J]. arXiv Preprint arXiv:1402.1128, 2014.
[3]	MIKOLOV T , KARAFIAT M , BURGET L ,et al. Recurrent neural network based language model[C]// Eleventh Annual Conference of the International Speech Communication Association. 2010.
[4]	CHO K , VAN -MERRIENBOER B , GULCEHRE C ,et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv Preprint arXiv:1406.1078, 2014.
[5]	GRAVES A , MOHAMED A , HINTON G . Speech recognition with deep recurrent neural networks[C]// 2013 IEEE International Conference on.Acoustics,speech and signal processing (icassp). 2013: 6645-6649.
[6]	BYEONW , BREUEL T M , RAUE F , et al . Scene labeling with LSTM recurrent neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3547-3555.
[7]	ZHANG Y , WANG C , GONG L ,et al. A power-efficient accelerator based on FPGA for LSTM network[C]// 2017 IEEE International Conference on Cluster Computing (CLUSTER). 2017: 629-630.
[8]	GUO K , ZENG S , YU J ,et al. A survey of FPGA-based neural network accelerator[J]. arXiv preprint arXiv:1712.08934, 2017.
[9]	HWANG K , SUNG W . Single stream parallelization of generalized LSTM-like RNNs on a GPU[J]. arXiv Preprint arXiv:1503.02852, 2015.
[10]	ABADI M , AGARWAL A , BARHAM P ,et al. Tensorflow:largescale machine learning on heterogeneous distributed systems[J]. arXiv preprint arXiv:1603.04467, 2016.
[11]	OUYANG P , YIN S , WEI S . A fast and power efficient architecture to parallelize LSTM based RNN for cognitive intelligence applications[C]// The 54th Annual Design Automation Conference 2017. ACM, 2017:63.
[12]	NURVITADHI E , SIM J , SHEFFIELD D ,et al. Accelerating recurrent neural networks in analytics servers:comparison of FPGA,CPU,GPU,and ASIC[C]// 2016 26th International Conference on Field Programmable Logic and Applications (FPL). 2016: 1-4.
[13]	HOPFIELD J J . Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences, 1982,79(8): 2554-2558.
[14]	HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
[15]	CHUNG J , GULCEHRE C , CHO K H ,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv Preprint arXiv:1412.3555, 2014.
[16]	ZAREMBA W , SUTSKEVER I , VINYALS O . Recurrent neural network regularization[J]. arXiv Preprint arXiv:1409.2329, 2014.
[17]	RYBALKIN V , PAPPALARDO A , GHAFFAR M M ,et al. FINN-L:library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs[J]. arXiv Preprint arXiv:1807.04093, 2018.
[18]	RYBALKIN V , WEHN N , YOUSEFI M R ,et al. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition[C]// The Conference on Design,Automation ＆ Test in Europe.European Design and Automation Association. 2017: 1394-1399.
[19]	GUAN Y , LIANG H , XU N ,et al. FP-DNN:an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates[C]// 2017 IEEE 25th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM). 2017: 152-159.
[20]	LI S , LI W , COOK C ,et al. Independently recurrent neural network (indrnn):building a longer and deeper RNN[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5457-5466.
[21]	HAJDUK Z . Reconfigurable FPGA implementation of neural networks[J]. Neuro Computing, 2018,308: 227-234.
[22]	LIU B , DONG W , XU T ,et al. E-ERA:an energy-efficient reconfigurable architecture for RNN using dynamically adaptive approximate computing[J]. IEICE Electronics Express, 2017,14(15): 20170637-20170637.
[23]	宋翔, 周凡, 陈耀武 ,等. 基于 FPGA 的实时双精度浮点矩阵乘法器设计[J]. 浙江大学学报(工学版), 2008,42(9): 1611-1615.
	SONG X , ZHOU F , CHEN Y W ,et al. Design of real time double precision floating point matrix multiplier based on FPGA[J]. Journal of ZheJiang University, 2008,42(9): 1611-1615.
[24]	GUAN Y , YUAN Z , SUN G ,et al. FPGA-based accelerator for long short-term memory recurrent neural networks[C]// IEEE Design Automation Conference (ASP-DAC). 2017: 629-634.
[25]	CHANG A X M , CULURCIELLO E . Hardware accelerators for recurrent neural networks on FPGA[C]// 2017 IEEE International Symposium on.Circuits and Systems (ISCAS). 2017: 1-4.
[26]	CHANG A X M , MARTINI B , CULURCIELLO E . Recurrent neural networks hardware implementation on FPGA[J]. arXiv Preprint arXiv:1511.05552, 2015.
[27]	LI S , WU C , LI H ,et al. Fpga acceleration of recurrent neural network based language model[C]// 2015 IEEE 23rd Annual International Symposium on Field-programmable Custom Computing Machines. IEEE, 2015: 111-118.
[28]	LEE M , HWANG K , PARK J ,et al. FPGA-based low-powerspeech recognition with recurrent neural networks[C]// 2016 IEEE International Workshop on.Signal Processing Systems (SiPS). 2016: 230-235.
[29]	WANG S , LI Z , DING C ,et al. C-LSTM:enabling efficient LSTM using structured compression techniques on FPGAs[C]// ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2018: 11-20.
[30]	ZHANG Y , WANG C , GONG L ,et al. Implementation and optimization of the accelerator based on FPGA hardware for LSTM network[C]// IEEE International Symposium on Parallel and Distributed Processing with Applications and IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). 2017: 614-621.
[31]	LIAO Y , LI H , WANG Z . Based real-time processing architecture for recurrent neural network[C]// International Conference on Intelligent and Interactive Systems and Applications. 2017: 705-709.
[32]	SALCIC Z , BERBER S , SECKER P . FPGA prototyping of RNN decoder for convolutional codes[J]. EURASIP Journal on Advances in Signal Processing, 2006,2006(1):015640.
[33]	FERREIRA J C , FONSECA J . An FPGA implementation of a long short-term memory neural network[C]// 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig). 2016: 1-8.
[34]	SHIN S , HWANG K , SUNG W . Fixed-point performance analysis of recurrent neural networks[J]. arXiv Preprint arXiv:1512.01322, 2015.
[35]	HAN S , POOL J , TRAN J ,et al. Learning both weights and connections for efficient neural network[C]// Advances in Neural Information Processing Systems. 2015: 1135-1143.
[36]	HAN S , KANG J , MAO H ,et al. Ese:efficient speech recognition engine with sparse LSTM on FPGA[C]// ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2017: 75-84.
[37]	ALI S M , SHAOJUN W , NING M ,et al. A bandwidth in-sensitive low stall sparse matrix vector multiplication architecture on reconfigurable FPGA platform[C]// 13th IEEE International Conference on Electronic Measurement ＆ Instruments (ICEMI). 2017: 171-176.
[38]	FOWERS J , OVTCHAROV K , STRAUSS K ,et al. A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication[C]// IEEE 22nd Annual International Symposium on FieldProgrammable Custom Computing Machines (FCCM). 2014: 36-43.
[39]	NEIL D , LEE J H , DELBRUCK T ,et al. Delta networks for optimized recurrent network computation[J]. arXiv Preprint arXiv:1612.05571, 2016.
[40]	GAO C , NEIL D , CEOLINI E ,et al. DeltaRNN:a power-efficient recurrent neural network accelerator[C]// ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2018: 21-30.
[41]	KINGSBURY B E D , SAINATH T N , SINDHWANI V . Low-rank matrix factorization for deep belief network training with high-dimensional output targets[P].2016-2-16.
[42]	XUE J , LI J , GONG Y . Restructuring of deep neural network acoustic models with singular value decomposition[C]// Interspeech. 2013: 2365-2369.
[43]	QIU J , WANG J , YAO S ,et al. Going deeper with embedded FPGA platform for convolutional neural network[C]// ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2016: 26-35.
[44]	LU Z , SINDHWANI V , SAINATH T N . Learning compact recurrent neural networks[J]. arXiv Preprint arXiv:1604.02594, 2016.
[45]	RIZAKIS M , VENIERIS S I , KOURIS A ,et al. Approximate FPGA-based LSTM under computation time constraints[J]. arXiv Preprint arXiv:1801.02190, 2018.
[46]	LI Z , WANG S , DING C ,et al. Efficient recurrent neural networks using structured matrices in FPGA[J]. arXiv Preprint arXiv:1803.07661, 2018.
[47]	WANG Z , LIN J , WANG Z . Accelerating recurrent neural networks:a memory-efficient approach[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017,25(10): 2763-2775.

Metrics

Recommended 0

No Suggested Reading articles found!

变体	输入信息	隐层信息	总参数量
标准RNN	M ×N	N×N	MN × N ²
LSTM	4M × N	4N × N	4MN×N²
GRU	3M × N	3N × N	3MN×N²

文献	模型	对比平台	量化方法	数据压缩倍数	计算速度提升
文献[28]	LSTM	NVIDIA GeForce Titan X	Fixed-point 6	4倍	4.12倍
文献[33]	LSTM	CORE i7-3770k	Fixed-point 17	—	251倍
文献[26]	LSTM	ARM Cortex-A9 CPU	Fixed-point 16	—	21倍
文献[34]	LSTM	—	非线性量化	5~9倍	—

Survey of FPGA based recurrent neural network accelerator

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 47

Related Articles 6

Metrics

Recommended 0

[1]	Peijie LI, Li ZHANG, Yunfei XIA, Liming XU. Architecture design of re-configurable convolutional neural network on software definition [J]. Chinese Journal of Network and Information Security, 2021, 7(3): 29-36.
[2]	Zhen ZHOU, Debiao HE, Min LUO, Li LI. Compact software/hardware co-design and implementation method of Aigis-sig digital signature scheme [J]. Chinese Journal of Network and Information Security, 2021, 7(2): 64-76.
[3]	Jiana LIAN, Pengjun WANG, Gang LI, Xuejiao MA, Guanbao ZHAI. Novel hybrid strong and weak PUF design based on FPGA [J]. Chinese Journal of Network and Information Security, 2021, 7(2): 94-103.
[4]	Jian JIA,Linfeng LIU,Jiagao WU. Charging pile recommendation method for idle electric taxis based on recurrent neural network [J]. Chinese Journal of Network and Information Security, 2020, 6(6): 152-163.
[5]	Lixin MIAO,Qinrang LIU,Xin WANG. Software-defined protocol independent parser based on FPGA [J]. Chinese Journal of Network and Information Security, 2020, 6(1): 70-76.
[6]	Meng-li SHAO,Xin-chun YIN,Yan-mei LI. Implementation of SM3 algorithm based on SoPC component [J]. Chinese Journal of Network and Information Security, 2017, 3(5): 47-53.