网络与信息安全学报 ›› 2018, Vol. 4 ›› Issue (12): 16-24.doi: 10.11959/j.issn.2096-109x.2018100

• 学术论文 • 上一篇    下一篇

基于线性脉动阵列的卷积神经网络计算优化与性能分析

刘勤让,刘崇阳(),周俊,王孝龙   

  1. 国家数字交换系统工程技术研究中心,河南 郑州 450002
  • 修回日期:2018-10-29 出版日期:2018-12-01 发布日期:2018-12-30
  • 作者简介:刘勤让(1975-),男,河南睢县人,国家数字交换系统工程技术研究中心研究员,主要研究方向为宽带信息网络、片上网络设计。|刘崇阳(1994-),男,湖北宜昌人,国家数字交换系统工程技术研究中心硕士生,主要研究方向为人工智能、深度学习。|周俊(1979-),男,湖北黄冈人,国家数字交换系统工程技术研究中心讲师,主要研究方向为芯片设计、宽带信息处理。|王孝龙(1993-),男,河南民权人,国家数字交换系统工程技术研究中心硕士生,主要研究方向为宽带信息网络、协议解析。
  • 基金资助:
    国家科技重大专项基金资助项目(2016ZX01012101);国家自然科学基金资助项目(61572520);国家自然科学基金创新研究群体资助项目(61521003)

Based on linear systolic array for convolutional neural network’s calculation optimization and performance analysis

Qinrang LIU,Chongyang LIU(),Jun ZHOU,Xiaolong WANG   

  1. National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China
  • Revised:2018-10-29 Online:2018-12-01 Published:2018-12-30
  • Supported by:
    The National Science Technology Major Project of China(2016ZX01012101);The National Natural Science Foundation of China(61572520);The National Natural Science Foundation Innovation Group Project of China(61521003)

摘要:

针对大部分FPGA端上的卷积神经网络(CNN,convolutional neural network)加速器设计未能有效利用稀疏性的问题,从带宽和能量消耗方面考虑,提出了基于线性脉动阵列的2种改进的CNN计算优化方案。首先,卷积转化为矩阵相乘形式以利用稀疏性;其次,为解决传统的并行矩阵乘法器存在较大I/O需求的问题,采用线性脉动阵列改进设计;最后,对比分析了传统的并行矩阵乘法器和2种改进的线性脉动阵列用于CNN加速的利弊。理论证明及分析表明,与并行矩阵乘法器相比,2种改进的线性脉动阵列都充分利用了稀疏性,具有能量消耗少、I/O带宽占用少的优势。

关键词: 线性脉动阵列, 卷积神经网络, 稀疏性, I/O带宽, 性能分析

Abstract:

Concerning the issue that the convolutional neural network (CNN) accelerator design on most FPGA ends fails to effectively use the sparsity and considering both bandwidth and energy consumption,two improved CNN calculation optimization strategies based on linear systolic array architecture are proposed.Firstly,convolution is transformed into matrix multiplication to take advantage of sparsity.Secondly,in order to solve the problem of large I/O demand in traditional parallel matrix multiplier,linear systolic array is used to improve the design.Finally,a CNN acceleration comparative analysis of the advantages and disadvantages between parallel matrix multiplier and two improved linear systolic arrays is presented.Theoretical proof and analysis show that compared with the parallel matrix multiplier,the two improved linear systolic arrays make full use of sparsity,and have the advantages of less energy consumption and less I/O bandwidth occupation.

Key words: linear systolic array, convolutional neural network, sparsity, I/O bandwidth, performance analysis

中图分类号: 

No Suggested Reading articles found!