高效深度神经网络综述

doi:10.11959/j.issn.1000-0801.2020119

摘要/Abstract

摘要：

近年来，深度神经网络（DNN）在计算机视觉、自然语言处理等AI领域中取得了巨大的成功。得益于更深更大的网络结构，DNN的性能正在迅速提升。然而，更深更大的深度神经网络需要巨大的计算和内存资源，在资源受限的场景中，很难部署较大的神经网络模型。如何设计轻量并且高效的深度神经网络来加速其在嵌入式设备上的运行速度，对于推进深度神经网络技术的落地意义巨大。对近年来具有代表性的高效深度神经网络的研究方法和工作进行回顾和总结，包括参数剪枝、模型量化、知识蒸馏、网络搜索和量化。同时分析了不同方法的优点和缺点以及适用场景，并且展望了高效神经网络设计的发展趋势。

关键词: 深度神经网络, 模型压缩与加速, 知识蒸馏

Abstract:

Recently,deep neural network (DNN) has achieved great success in the field of AI such as computer vision and natural language processing.Thanks to a deeper and larger network structure,DNN’s performance is rapidly increasing.However,deeper and lager deep neural networks require huge computational and memory resources.In some resource-constrained scenarios,it is difficult to deploy large neural network models.How to design a lightweight and efficient deep neural network to accelerate its running speed on embedded devices is a great research hotspot for advancing deep neural network technology.The research methods and work of representative high-efficiency deep neural networks in recent years were reviewed and summarized,including parameter pruning,model quantification,knowledge distillation,network search and quantification.Also,vadvantages and disadvantages of different methods as well as applicable scenarios were analyzed,and the future development trend of efficient neural network design was forecasted.

Key words: deep neural network, model accelerator and compression, knowledge distillation

中图分类号:

TP393

闵锐. 高效深度神经网络综述[J]. 电信科学, 2020, 36(4): 115-124.

Rui MIN. A survey of efficient deep neural network[J]. Telecommunications Science, 2020, 36(4): 115-124.

图/表 5

表1

图1

图2

图3

图4

参考文献 61

[1]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Lake Tahoe:MIT Press, 2012.
[2]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018.
[3]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[C]// Proceedings of International Conference on Learning Representations. San Diego:Elsevier, 2015.
[4]	XIE S , GIRSHICK R , DOLL R P ,et al. Aggregated residual transformations for deep neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017.
[5]	LECUN Y , DENKER J S , SOLLA S A . Optimal brain damage[C]// Proceedings of the Advances in Neural Information Processing Systems. Piscataway:IEEE Press, 1990.
[6]	HOWARD A G , ZHU M , CHEN B ,et al. Mobilenets:efficient convolutional neural networks for mobile vision applications[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017
[7]	SANDLER M , HOWARD A , ZHU M ,et al. Mobilenetv2:inverted residuals and linear bottlenecks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018.
[8]	ZHANG X , ZHOU X , LIN M ,et al. Shufflenet:an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018.
[9]	MCCULLOCH W S , PITTS W . A logical calculus of the ideas immanent in nervous activity[J]. The Bulletin of Mathematical Biophysics, 1943,5(4): 115-133.
[10]	ROSENBLATT F . The perceptron:a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958,65(6):386.
[11]	MARVIN M , SEYMOUR P . Perceptrons[M]. Cambridge: MIT PressPress, 1969.
[12]	HOPFIELD J J . Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the National Academy of Sciences, 1982,79(8): 2554-2558.
[13]	HINTON G E , SEJNOWSKI T J . Learning and relearning in Boltzmann machines[J]. Parallel Distributed Processing:Explorations in the Microstructure of Cognition, 1986,1(282-317):2.
[14]	MCCLELLAND J L , RUMELHART D E , GROUP P R . Parallel distributed processing[M]. Cambridge: MIT Press Press, 1987.
[15]	LECUN Y , BOTTOU L , BENGIO Y ,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998,86(11): 2278-324.
[16]	HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks[J]. Science, 2006,313(5786): 504-507.
[17]	TAIGMAN Y , YANG M , RANZATO M A ,et al. Deepface:closing the gap to human-level performance in face verification[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2014.
[18]	SUN Y , WANG X , TANG X . Deep learning face representation from predicting 10,000 classes[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2014.
[19]	SCHROFF F , KALENICHENKO D , PHILBIN J . Facenet:A unified embedding for face recognition and clustering[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015.
[20]	DENG J , GUO J , XUE N ,et al. Arcface:additive angular margin loss for deep face recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2019.
[21]	WEN Y , ZHANG K , LI Z ,et al. A discriminative feature learning approach for deep face recognition[C]// Proceedings of the European Conference on Computer Vision. Glasgow:Springer, 2016.
[22]	GIRSHICK R , DONAHUE J , DARRELL T ,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2014.
[23]	GIRSHICK R , . Fast R-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2015.
[24]	REN S , HE K , GIRSHICK R ,et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Cambridge:MIT Press, 2015.
[25]	HE K , GKIOXARI G , DOLL R P ,et al. Mask R-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017.
[26]	WANG L , XIONG Y , WANG Z ,et al. Temporal segment networks:towards good practices for deep action recognition[C]// Proceedings of the European Conference on Computer Vision. Amsterdam:Springer, 2016.
[27]	MIKOLOV T , YIH W-T , ZWEIG G . Linguistic regularities in continuous space word representations[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Piscataway:IEEE Press, 2013.
[28]	KALCHBRENNER N , GREFENSTETTE E , BLUNSOM P . A convolutional neural network for modelling sentences[J]. arXiv:14042188, 2014
[29]	HASSIBI B , STORK D G , WOLFF G J . Optimal brain surgeon and general network pruning[C]// Proceedings of the IEEE International Conference on Neural Networks. Piscataway:IEEE Press, 1993.
[30]	SRINIVAS S , BABU R V . Data-free parameter pruning for deep neural networks[C]// Proceedings of the British Machine Vision Conference. Swansea:BMVA Press, 2015
[31]	HAN S , POOL J , TRAN J ,et al. Learning both weights and connections for efficient neural network[C]// Proceedings of the Advances in Neural Information Processing Systems. Cambridge:MIT Press , 2015.
[32]	HAN S , LIU X , MAO H ,et al. EIE:efficient inference engine on compressed deep neural network[C]// Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Piscataway:IEEE Press, 2016.
[33]	PARK J , LI S , WEN W ,et al. Faster cnns with direct sparse convolutions and guided pruning[C]// International Conference on Learning Representations. San Diego:Elsevier, 2016,
[34]	LIU B , WANG M , FOROOSH H ,et al. Sparse convolutional neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015.
[35]	ANWAR S , HWANG K , SUNG W . Structured pruning of deep convolutional neural networks[J]. ACM Journal on Emerging Technologies in Computing Systems (JETC), 2017,13(3):32.
[36]	LEBEDEV V , LEMPITSKY V . Fast convnets using group-wise brain damage[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016.
[37]	HU H , PENG R , TAI Y-W ,et al. Network trimming:a data-driven neuron pruning approach towards efficient deep architectures[C]// Proceedings of International Conference on Learning Representations. San Diego:Elsevier, 2016.
[38]	MOLCHANOV P , TYREE S , KARRAS T ,et al. Pruning convolutional neural networks for resource efficient transfer learning[C]// Proceedings of International Conference on Learning Representations. San Diego:Elsevier, 2017.
[39]	KIM Y-D , PARK E , YOO S ,et al. Compression of deep convolutional neural networks for fast and low power mobile applications[C]// International Conference on Learning Representations. San Diego:Elsevier, 2016.
[40]	ALEXANDER N , PODOPRIKHIN D , OSOKIN A ,et al. Tensorizing neural networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Montreal:MIT Press , 2015.
[41]	DENTON E L , ZAREMBA W , BRUNA J ,et al. Exploiting linear structure within convolutional networks for efficient evaluation[C]// Proceedings of the Advances in Neural Information Processing Systems. Montreal:MIT Press, 2014.
[42]	JADERBERG M , VEDALDI A , ZISSERMAN A . Speeding up convolutional neural networks with low rank expansions[C]// Proceedings of the British Machine Vision Conference. Nottingham:BMVA Press, 2014,
[43]	CHEN W , WILSON J , TYREE S ,et al. Compressing neural networks with the hashing trick[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n]. 2015.
[44]	COURBARIAUX M , BENGIO Y , DAVID J-P . Training deep neural networks with low precision multiplications[C]// International Conference on Learning Representations. San Diego:Elsevier, 2015.
[45]	DETTMERS T , . 8-bit approximations for parallelism in deep learning[C]// 4th International Conference on Learning Representations. San Diego:Elsevier, 2016.
[46]	RASTEGARI M , ORDONEZ V , REDMON J ,et al. Xnor-net:Imagenet classification using binary convolutional neural networks[C]// Proceedings of the European Conference on Computer Vision. Amsterdam:Springer, 2016.
[47]	LI F , ZHANG B , LIU B . Ternary weight networks[J]. arXiv preprint arXiv:160504711, 2016,
[48]	HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network[C]// Proceedings of the Advances in Neural Information Processing Systems. Montreal:MIT Press, 2015.
[49]	C,BUCILUǎ C , CARUANA R , NICULESCU-MIZIL A , . Model compression[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2006.
[50]	BA J , CARUANA R . Do deep nets really need to bedeep?[C]// Proceedings of the Advances in Neural Information Processing Systems. Montreal:MIT Press , 2014.
[51]	ROMERO A , BALLAS N , KAHOU S E ,et al. Fitnets:hints for thin deep nets[C]// International Conference on Learning Representations. San Diego:Elsevier, 2015.
[52]	ZAGORUYKO S , KOMODAKIS N . Paying more attention to attention:Improving the performance of convolutional neural networks via attention transfer[C]// /International Conference on Learning Representations. San Diego:Elsevier, 2016.
[53]	LI Q , JIN S , YAN J . Mimicking very efficient network for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017.
[54]	LUO P , ZHU Z , LIU Z ,et al. Face model compression by distilling knowledge from neurons[C]// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.[S.l.:s.n. ], 2016.
[55]	LIU Y , CAO J , LI B ,et al. Knowledge distillation via instance relationship graph[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2019.
[56]	PENG B , JIN X , LIU J ,et al. Correlation congruence for knowledge distillation[C]// Proceedings of the IEEE International Conference on Computer Vision. Seoul:Elsevier, 2019.
[57]	LAN X , ZHU X , GONG S . Knowledge distillation by on-the-fly native ensemble[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal:MIT Press, 2018.
[58]	BAKER B , GUPTA O , NAIK N ,et al. Designing neural network architectures using reinforcement learning[J]. arXiv:1611.02167, 2016
[59]	ZOPH B , LE Q V . Neural architecture search with reinforcement learning[C]// Proceedings of International Conference on Learning Representations. Toulon:Elsevier, 2017.
[60]	REAL E , MOORE S , SELLE A ,et al. Large-scale evolution of image classifiers[C]// Proceedings of the 34th International Conference on Machine Learning.[S.l.:s.n]. 2017: 2902-2911.
[61]	LIU H , SIMONYAN K , YANG Y . Darts:differentiable architecture search[C]// Proceedings of International Conference on Learning Representations. New Orleans:Elsevier, 2019.

方法	原理	代表工作	改动网络	缺点
参数剪枝	判断参数重要与否，移除不重要的参数	Taylor 剪枝	需要	实际加速效果有限
低秩分解	对卷积核或全连接进行矩阵或张量分解	SVD 分解	需要	难以分解精简卷积核和较小的卷积核
		Tucker 分解
知识蒸馏	利用已训练好的大型网络的输出作为soft-target 来替代 one-hot 标签	FitNets、DML	不需要	不能直接加速网络
网络架构搜索	自动搜索满足要求的高效网络结构	DARTS、FBNet	不需要	搜索过程漫长
模型量化	利用更少的比特来表示模型的权重和中间值	二值网络	不需要	加速需要特殊硬件支持