电信科学 ›› 2022, Vol. 38 ›› Issue (1): 61-72.doi: 10.11959/j.issn.1000-0801.2022014
陈悦1, 郭宇1,2, 谢圆琰1, 米振强1
修回日期:
2021-11-19
出版日期:
2022-01-20
发布日期:
2022-01-01
作者简介:
陈悦(1998- ),女,北京科技大学计算机与通信工程学院硕士生,主要研究方向为计算机视觉与人工智能Yue CHEN1, Yu GUO1,2, Yuanyan XIE1, Zhenqiang MI1
Revised:
2021-11-19
Online:
2022-01-20
Published:
2022-01-01
摘要:
摘 要:针对现有盲人视觉辅助设备存在的不便,探讨了基于模型剪枝的图像描述模型在便携式移动设备上运行的方法。回顾了图像描述模型和剪枝模型技术,重点提出了一种针对图像描述模型的改进剪枝算法。结果表明,在保证准确性的前提下,剪枝后的图像描述模型可以大幅降低工作时的处理时间和消耗的电源容量,能够随时随地快速准确地对环境信息进行描述及语音播报。
中图分类号:
陈悦, 郭宇, 谢圆琰, 米振强. 基于图像描述算法的离线盲人视觉辅助系统[J]. 电信科学, 2022, 38(1): 61-72.
Yue CHEN, Yu GUO, Yuanyan XIE, Zhenqiang MI. Offline visual aid system for the blind based on image captioning[J]. Telecommunications Science, 2022, 38(1): 61-72.
[1] | 康帅, 章坚武, 朱尊杰 ,等. 改进 YOLOv4 算法的复杂视觉场景行人检测方法[J]. 电信科学, 2021,37(8): 46-56. |
KANG S , ZHANG J W , ZHU Z J ,et al. An improved YOLOv4 algorithm for pedestrian detection in complex visual scenes[J]. Telecommunications Science, 2021,37(8): 46-56. | |
[2] | MAO J H , XU W , YANG Y ,et al. Explain images with multimodal recurrent neural networks[EB]. 2014. |
[3] | VINYALS O , TOSHEV A , BENGIO S ,et al. Show and tell:a neural image caption generator[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2015. |
[4] | ANDERSON P , HE X D , BUEHLER C ,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6077-6086. |
[5] | LUO Y P , JI J Y , SUN X S ,et al. Dual-level collaborative transformer for image captioning[EB]. 2021. |
[6] | YANG X , TANG K H , ZHANG H W ,et al. Auto-encoding scene graphs for image captioning[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 10685-10694. |
[7] | CHEN S Z , JIN Q , WANG P ,et al. Say as you wish:fine-grained control of image caption generation with abstract scene graphs[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9962-9971. |
[8] | WANG Z Y , FENG B , NARASIMHAN K ,et al. Towards unique and informative captioning of images[M]// Computer Vision – ECCV 2020.Cham:Springer International Publishing,[S.l.:s.n.], 2020: 629-644. |
[9] | XU G H , NIU S C , TAN M K ,et al. Towards accurate text-based image captioning with content diversity exploration[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 12637-12646. |
[10] | DENTON E , ZAREMBA W,BRUNA , et al . Exploiting linear structure within convolutional networks for efficient evaluation[C]// Advances in neural information processing systems. Cambridge:MIT Press, 2014: 1269-1277. |
[11] | ZHUANG Z W , TAN M K , ZHUANG B H ,et al. Discrimination-aware channel pruning for deep neural networks[EB]. 2018. |
[12] | RASTEGARI M , ORDONEZ V , REDMON J ,et al. Xnor-net:imagenet classification using binary convolutional neural networks[C]// European conference on computer vision. Berlin:Springer, 2016: 525-542. |
[13] | WANG K , LIU Z J , LIN Y J ,et al. HAQ:hardware-aware automated quantization with mixed precision[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 8612-8620. |
[14] | CHEN H T , WANG Y H , XU C ,et al. Data-free learning of student networks[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 3514-3522. |
[15] | LUO L C , SANDLER M , LIN Z ,et al. Large-scale generative data-free distillation[EB]. 2020. |
[16] | YU X Y , LIU T L , WANG X C ,et al. On compressing deep models by low rank and sparse decomposition[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 7370-7379. |
[17] | YANG Z , WANG Y , LIU C ,et al. Legonet:efficient convolutional neural networks with lego filters[C]// International Conference on Machine Learning. New York:ACM Press, 2019: 7005-7014. |
[18] | CHEN H T , WANG Y H , XU C J ,et al. AdderNet:do we really need multiplications in deep learning?[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 1468-1477. |
[19] | XU Y , XU C , CHEN X ,et al. Kernel based progressive distillation for adder neural networks[EB]. 2020. |
[20] | SONG D H , WANG Y H , CHEN H T ,et al. AdderSR:towards energy efficient image super-resolution[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 15648-15657. |
[21] | PARK Y , YUN I D . Fast adaptive RNN Encoder?Decoder for anomaly detection in SMD assembly machine[J]. Sensors (Basel,Switzerland), 2018,18(10): 3573. |
[22] | XU K , BA J , KIROS R ,et al. Show,attend and tell:neural image caption generation with visual attention[EB]. 2015. |
[23] | XINGJIAN S H I , CHEN Z , WANG H ,et al. Convolutional LSTM network:A machine learning approach for precipitation nowcasting[C]// Advances in neural information processing systems. Cambridge:MIT Press, 2015: 802-810. |
[24] | MOLCHANOV P , TYREE S , KARRAS T ,et al. Pruning convolutional neural networks for resource efficient inference[EB]. 2016. |
[25] | 王从徐 . 基于泰勒级数展开及其应用探讨[J]. 红河学院学报, 2021,19(02): 154-156. |
WANG C X . Discussion on Taylor series expansion and its application[J]. Journal of Honghe University, 2021,19(02): 154-156. | |
[26] | HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task:data,models and evaluation metrics[J]. Journal of Artificial Intelligence Research, 2013,47: 853-899. |
[27] | 蔡鑫 . 基于 Bert 模型的互联网不良信息检测[J]. 电信科学, 2020,36(11): 121-126. |
CAI X . Internet bad information detection based on Bert model[J]. Telecommunications Science, 2020,36(11): 121-126. | |
[28] | LIN C Y , . Rouge:a package for automatic evaluation of summaries[C]// Text summarization branches out. Barcelona:ACL, 2004: 74-81. |
[1] | 金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95. |
[2] | 马辉, 王瑞琴, 杨帅. 一种渐进式增长条件生成对抗网络模型[J]. 电信科学, 2023, 39(6): 105-113. |
[3] | 卢敏, 胡娟, 张先超, 丁伟健, 乐光学. 基于用户多特征融合的个性化推荐模型[J]. 电信科学, 2023, 39(5): 101-115. |
[4] | 张永, 刘纪奎, 柯文龙. 基于并行可分离卷积和标签平滑正则化的脑电情感识别[J]. 电信科学, 2023, 39(5): 116-128. |
[5] | 邓琨, 蒋庆丰, 刘星妍. 融合节点分析与边分析的复杂网络社区识别算法[J]. 电信科学, 2023, 39(4): 87-100. |
[6] | 冶莉娟, 王亦婷, 朱励程. 基于细胞自动机模型电力网络攻击预测技术[J]. 电信科学, 2023, 39(4): 173-179. |
[7] | 韩一士, 徐雨欣, 卢甜甜. 一种基于耦合网络的RD-IHSAT网络谣言传播模型[J]. 电信科学, 2023, 39(2): 118-131. |
[8] | 徐嘉, 简志华, 金宏辉, 吴超, 游林, 吴迎笑. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023, 39(1): 72-78. |
[9] | 任华健, 郝秀兰, 徐稳静. 融合递增词汇选择的深度学习中文输入法[J]. 电信科学, 2022, 38(12): 56-64. |
[10] | 周薇娜, 刘露. 复杂场景下多尺度船舶实时检测方法[J]. 电信科学, 2022, 38(10): 67-78. |
[11] | 金楠, 王瑞琴, 陆悦聪. 基于艾宾浩斯遗忘曲线和注意力机制的推荐算法[J]. 电信科学, 2022, 38(10): 89-97. |
[12] | 杨帅, 王瑞琴, 马辉. 基于多通道的边学习图卷积网络[J]. 电信科学, 2022, 38(9): 95-104. |
[13] | 赵东明. 电信运营商知识图谱技术体系研究及应用实践[J]. 电信科学, 2022, 38(8): 151-162. |
[14] | 于佳祺, 简志华, 徐嘉, 游林, 汪云路, 吴超. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022, 38(6): 91-99. |
[15] | 申情, 郭文宾, 楼俊钢, 余强国. 考虑多层次潜在特征的个性化推荐模型[J]. 电信科学, 2022, 38(2): 71-83. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|