物联网学报 ›› 2022, Vol. 6 ›› Issue (4): 72-81.doi: 10.11959/j.issn.2096-3750.2022.00308

• 理论与技术 • 上一篇    下一篇

基于DNN卷积核分割的边缘协作推理性能分析

郅佳琳, 滕颖蕾, 张新阳, 牛涛, 宋梅   

  1. 北京邮电大学电子工程学院,北京 100876
  • 修回日期:2022-11-07 出版日期:2022-12-30 发布日期:2022-12-01
  • 作者简介:郅佳琳(1999- ),女,北京邮电大学硕士生,主要研究方向为边缘计算、深度学习等
    滕颖蕾(1983- ),女,博士,北京邮电大学教授、博士生导师,IEEE高级会员。主要研究方向为AI与无线通信,边缘计算及毫米波技术等
    张新阳(2000- ),男,北京邮电大学硕士生,主要研究方向为边缘智能、资源分配等
    牛涛(1997- ),男,北京邮电大学博士生,主要研究方向为边缘计算、人工智能等
    宋梅(1960- ),女,北京邮电大学电子工程学院教授、博士生导师,中国电子教育学会研究生教育分会常务理事,中国电子学会通信分会委员,中国电子学会物联网专家委员会委员,中国铁道学会信息化委员会委员,主要研究方向为数据与服务、通信与管理等
  • 基金资助:
    国家重点研发计划(2021YFB3300100);国家自然科学基金资助项目(62171062)

Cooperative inference analysis based on DNN convolutional kernel partitioning

Jialin ZHI, Yinglei TENG, Xinyang ZHANG, Tao NIU, Mei SONG   

  1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Revised:2022-11-07 Online:2022-12-30 Published:2022-12-01
  • Supported by:
    The National Key Research and Development Program of China(2021YFB3300100);The National Natural Science Foundation of China(62171062)

摘要:

随着智能芯片在边缘终端设备的普及,未来大量的AI应用将部署在更靠近数据源的网络边缘。基于DNN的分割方法可以实现深度学习模型在资源受限的终端设备上训练和部署,解决边缘 AI 算力瓶颈问题。在传统基于工作负载的分割方案(WPM, workload based partition method)的基础上,提出基于卷积核的分割方案(KPM, kernel based partition method),分别从计算量、内存占用、通信开销3个方面进行推理性能的定量分析,并从推理过程灵活性、鲁棒性、隐私性角度进行定性分析。最后搭建软硬件实验平台,使用PyTorch实现AlexNet和VGG11网络进一步验证所提方案在时延和能耗方面的性能优势,相比于传统工作负载分割方案,所提卷积核分割方案在大规模计算场景下有更好的DNN推理加速效果,且具有更低的内存占用和能量消耗。

关键词: 边缘智能, 深度神经网络分割, 协作计算, 并行推理

Abstract:

With the popularity of intelligent chip in the application of edge terminal devices, a large number of AI applications will be deployed on the edge of networks closer to data sources in the future.The method based on DNN partition can realize deep learning model training and deployment on resource-constrained terminal devices, and solve the bottleneck problem of edge AI computing ability.Thekernel based partition method (KPM) was proposed as a new scheme on the basis of traditional workload based partition method (WPM).The quantitative analysis of inference performance was carried out from three aspects of computation FLOPS, memory consumption and communication cost respectively, and the qualitative analysis of the above two schemes was carried out from the perspective of flexibility, robustness and privacy of inference process.Finally, a software and hardware experimental platform was built, and AlexNet and VGG11 networks were implemented using PyTorch to further verify the performance advantages of the proposed scheme in terms of delay and energy consumption.It was concluded that, compared with the WPM scheme, the KPM scheme had better DNN reasoning acceleration effect in large-scale computing scenarios.And it has lower memory usage and energy consumption.

Key words: edge intelligence, deep neural network partition, cooperative computation, parallel partition

中图分类号: 

No Suggested Reading articles found!