稀疏递归神经网络的可扩展低功耗加速器

doi:10.11959/j.issn.2096-8930.2023045

天地一体化信息网络 ›› 2023, Vol. 4 ›› Issue (4): 79-85.doi: 10.11959/j.issn.2096-8930.2023045

• 应用 • 上一篇

稀疏递归神经网络的可扩展低功耗加速器

金磐石¹, 李俊杰², 王静逸², 李鹏翀³, 邢磊², 李晓栋¹

¹ 中国建设银行股份有限公司，北京 100034
² 建信金融科技有限责任公司，上海 321004
³ 浪潮电子信息产业股份有限公司，山东济南 250000

修回日期:2023-11-30 出版日期:2023-12-01 发布日期:2023-12-01
作者简介:金磐石（1965- ），男，中国建设银行股份有限公司首席信息官，主要从事信息技术系统战略策划、规划、协调和实施工作
李俊杰（1978- ），男，现就职于建信金融科技有限责任公司，主要从事人工智能推理技术研究工作
王静逸（1990- ），男，现就职于建信金融科技有限责任公司，主要从事人工智能在金融科技领域的应用研究工作
李鹏翀（1981- ），男，浪潮电子信息产业股份有限公司网络研发部总经理，主要从事数据中心架构研究工作
邢磊（1981- ），男，建信金融科技有限责任公司基础技术中心副总裁，主要从事分布式架构的设计研发工作
李晓栋（1982- ），男，中国建设银行股份有限公司金融科技部技术架构管理处副处长，主要从事技术架构设计工作

Scalable Low Power Accelerator for Sparse Recurrent Neural Network

Panshi JIN¹, Junjie LI², Jingyi WANG², Pengchong LI³, Lei XING², Xiaodong LI¹

¹ China Construction Bank Co., Ltd., Beijing 100034, China
² Jianxin Financial Technology Co., Ltd., Shanghai 321004, China
³ Inspur Electronic Information Industry Co., Ltd., Jinan, Shandong 250000, China

Revised:2023-11-30 Online:2023-12-01 Published:2023-12-01

摘要/Abstract

摘要：

利用银行网点内边缘计算设备进行客流分析、安全保护、风险防控等应用日益广泛，其中 AI 推理芯片的性能和功耗已经成为边缘计算设备选型的一个非常重要的因素。针对递归神经网络由数据依赖性和低数据重用性导致的功耗大、推理性能弱、能效低，难以在低功耗平台上处理等问题，利用FPGA实现了一种电压可扩展的稀疏循环神经网络（RNN）低功率加速器，并在边缘设计算设备上进行了验证。首先，对稀疏RNN进行分析并采用网络压缩的方法设计了处理阵列；其次，由于稀疏RNN的工作负载不平衡，引入电压缩放方法以保持低功耗和高吞吐量。试验表明，该方法可以显著提高系统的RNN 推理速度并降低芯片的处理功耗。

关键词: RNN, 稀疏, 低功耗, 加速方案

Abstract:

The use of edge computing devices in bank outlets for passenger flow analysis, security protection, risk prevention and control is increasingly widespread, among which the performance and power consumption of AI reasoning chips have become a very important factor in the selection of edge computing devices.Aiming at the problems of recurrent neural network, such as high power consumption, weak reasoning performance and low energy efficiency, which were caused by data dependence and low data reusability, this paper realized a sparse RNN low-power accelerator with scalable voltage by using FPGA, and verifies it on the edge design and calculation equipment.Firstly, the sparse -RNN was analyzed and the processing array was designed by network compression.Secondly, due to the unbalanced workload of sparse RNN, it introduced voltage scaling method to maintain low power consumption and high throughput.Experiments show that this method could significantly improve the RNN reasoning speed of the system and reduce the processing power consumption of the chip.

Key words: RNN, sparse, low power consumption, acceleration scheme

中图分类号:

TP393

金磐石, 李俊杰, 王静逸, 李鹏翀, 邢磊, 李晓栋. 稀疏递归神经网络的可扩展低功耗加速器[J]. 天地一体化信息网络, 2023, 4(4): 79-85.

Panshi JIN, Junjie LI, Jingyi WANG, Pengchong LI, Lei XING, Xiaodong LI. Scalable Low Power Accelerator for Sparse Recurrent Neural Network[J]. Space-Integrated-Ground Information Networks, 2023, 4(4): 79-85.

图/表 12

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

图11

参考文献 9

[1]	JITENDRA , KUMAR . Long short term memory recurrent neural network (LSTM-RNN) based workload forecasting model for cloud datacenters[J]. Procedia Computer Science, 2018,125: 676-682.
[2]	RAHMAN M A , AHMED F , ALI N . Contextual deep search using long short term memory recurrent neural network[C]// Proceedings of 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST). Piscataway:IEEE Press, 2019: 39-42.
[3]	HAN S , LIU X Y , MAO H Z ,et al. EIE:efficient inference engine on compressed deep neural network[C]// Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Piscataway:IEEE Press, 2016: 243-254.
[4]	HAN S , MAO H , DALLY W J . Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding[J]. Fiber, 2016,56(4): 3-7.
[5]	DORRANCE R , REN F B , MARKOVI? D . A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs[C]// Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays. New York:ACM, 2014: 161-170.
[6]	PAL S , PARK D H , FENG S Y ,et al. A 7.3 M output non-zeros/J sparse matrix-matrix multiplication accelerator using memory reconfiguration in 40 nm[C]// Proceedings of 2019 Symposium on VLSI Circuits. Piscataway:IEEE Press, 2019: 150-151.
[7]	CHAKRABORTY S , BANIK J , ADDHYA S ,et al. Study of De eration models[C]// Proceedings of 2020 International Conference on Computer Science,Engineering and Applications (ICCSEA). Piscataway:IEEE Press, 2020: 1-5.
[8]	MIAO Y J , GOWAYYED M , METZE F . EESEN:End-to-end speech recognition using deep RNN models and WFST-based decoding[C]// Proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Piscataway:IEEE Press, 2015: 167-174.
[9]	WANG D , ZHANG X . THCHS-30:a free Chinese speech corpus[EB]. 2015.

FPGA型号	频率	面（Slice）	性能/（GOPS·s-1）	面积效率/（MOPS·s-1·Slice-1）
V6 XC6vlx75t-3	270	20.15k	128	6.35

稀疏递归神经网络的可扩展低功耗加速器

Scalable Low Power Accelerator for Sparse Recurrent Neural Network

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 9

相关文章 3

Metrics

推荐阅读 0

[1]	邓旭, 储珂, 朱立东. VBR-GAM的SCMA高维度码本设计[J]. 天地一体化信息网络, 2023, 4(1): 82-88.
[2]	鲁绍文, 侯霞, 李国通, 孙建锋, 俞杭华, 陈卫标. 空间光通信技术发展现状及趋势[J]. 天地一体化信息网络, 2022, 3(2): 39-46.
[3]	蒋怡婷,朱立东. 卫星通信系统的稀疏码多址检测技术研究[J]. 天地一体化信息网络, 2020, 1(1): 61-65.