基于图像描述算法的离线盲人视觉辅助系统

doi:10.11959/j.issn.1000-0801.2022014

Abstract

Abstract:

In view of the inconveniences of existing visual aid systems for the blind, the method of running the image captioning model on portable mobile devices based on model pruning was discussed.Model pruning techniques and image captioning models were reviewed.An improved model pruning algorithm for image captioning model was proposed.Experimental results show that, on the premise of ensuring accuracy, the image captioning model after pruning can greatly reduce processing time and power consumption capacity, and can quickly and accurately describe environmental information and voice broadcast anytime and anywhere.

Key words: visual assisted system, image captioning model, model compression and acceleration, model pruning algorithm

CLC Number:

TP391

Yue CHEN, Yu GUO, Yuanyan XIE, Zhenqiang MI. Offline visual aid system for the blind based on image captioning[J]. Telecommunications Science, 2022, 38(1): 61-72.

Figures/Tables 14

参数	描述
$\begin{array}{l} D = {X = {x_{0}, x_{1}, \dots, x_{N}}, \\ Y = {y_{0}, y_{1}, \dots, y_{N}}} \end{array}$	输入集合X和输出集合Y组成的训练集x_i，y_i分别表示第i _th个输入和输出
$W = {(w_{1}^{1}, b_{1}^{1}), (w_{1}^{2}, b_{1}^{2}), \dots, (w_{L}^{C_{l}}, b_{L}^{C_{l}})}$	网络模型参数，表示第 i 层的网络模型参数
W′	剪枝后的网络模型参数集合，W∈W′
C (D\|W)	预训练模型损失函数
C (D\|W′)	剪枝后的模型损失函数
C (D,h_i)	定义h_i的模型损失函数
B	非0参数的个数
$h = {z_{0}^{(1)}, z_{0}^{(2)}, \dots, z_{L}^{(C_{l})}}$	特征图集合
z_l	第l层卷积层的特征图
$w_{l}^{(k)}$	第l层卷积层的第k个卷积核
g_l	g_l∈{0, 1}

References 28

[1]	康帅, 章坚武, 朱尊杰 ,等. 改进 YOLOv4 算法的复杂视觉场景行人检测方法[J]. 电信科学, 2021,37(8): 46-56.
	KANG S , ZHANG J W , ZHU Z J ,et al. An improved YOLOv4 algorithm for pedestrian detection in complex visual scenes[J]. Telecommunications Science, 2021,37(8): 46-56.
[2]	MAO J H , XU W , YANG Y ,et al. Explain images with multimodal recurrent neural networks[EB]. 2014.
[3]	VINYALS O , TOSHEV A , BENGIO S ,et al. Show and tell:a neural image caption generator[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2015.
[4]	ANDERSON P , HE X D , BUEHLER C ,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6077-6086.
[5]	LUO Y P , JI J Y , SUN X S ,et al. Dual-level collaborative transformer for image captioning[EB]. 2021.
[6]	YANG X , TANG K H , ZHANG H W ,et al. Auto-encoding scene graphs for image captioning[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 10685-10694.
[7]	CHEN S Z , JIN Q , WANG P ,et al. Say as you wish:fine-grained control of image caption generation with abstract scene graphs[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9962-9971.
[8]	WANG Z Y , FENG B , NARASIMHAN K ,et al. Towards unique and informative captioning of images[M]// Computer Vision – ECCV 2020.Cham:Springer International Publishing,[S.l.:s.n.], 2020: 629-644.
[9]	XU G H , NIU S C , TAN M K ,et al. Towards accurate text-based image captioning with content diversity exploration[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 12637-12646.
[10]	DENTON E , ZAREMBA W,BRUNA , et al . Exploiting linear structure within convolutional networks for efficient evaluation[C]// Advances in neural information processing systems. Cambridge:MIT Press, 2014: 1269-1277.
[11]	ZHUANG Z W , TAN M K , ZHUANG B H ,et al. Discrimination-aware channel pruning for deep neural networks[EB]. 2018.
[12]	RASTEGARI M , ORDONEZ V , REDMON J ,et al. Xnor-net:imagenet classification using binary convolutional neural networks[C]// European conference on computer vision. Berlin:Springer, 2016: 525-542.
[13]	WANG K , LIU Z J , LIN Y J ,et al. HAQ:hardware-aware automated quantization with mixed precision[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 8612-8620.
[14]	CHEN H T , WANG Y H , XU C ,et al. Data-free learning of student networks[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 3514-3522.
[15]	LUO L C , SANDLER M , LIN Z ,et al. Large-scale generative data-free distillation[EB]. 2020.
[16]	YU X Y , LIU T L , WANG X C ,et al. On compressing deep models by low rank and sparse decomposition[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 7370-7379.
[17]	YANG Z , WANG Y , LIU C ,et al. Legonet:efficient convolutional neural networks with lego filters[C]// International Conference on Machine Learning. New York:ACM Press, 2019: 7005-7014.
[18]	CHEN H T , WANG Y H , XU C J ,et al. AdderNet:do we really need multiplications in deep learning?[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 1468-1477.
[19]	XU Y , XU C , CHEN X ,et al. Kernel based progressive distillation for adder neural networks[EB]. 2020.
[20]	SONG D H , WANG Y H , CHEN H T ,et al. AdderSR:towards energy efficient image super-resolution[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 15648-15657.
[21]	PARK Y , YUN I D . Fast adaptive RNN Encoder?Decoder for anomaly detection in SMD assembly machine[J]. Sensors (Basel,Switzerland), 2018,18(10): 3573.
[22]	XU K , BA J , KIROS R ,et al. Show,attend and tell:neural image caption generation with visual attention[EB]. 2015.
[23]	XINGJIAN S H I , CHEN Z , WANG H ,et al. Convolutional LSTM network:A machine learning approach for precipitation nowcasting[C]// Advances in neural information processing systems. Cambridge:MIT Press, 2015: 802-810.
[24]	MOLCHANOV P , TYREE S , KARRAS T ,et al. Pruning convolutional neural networks for resource efficient inference[EB]. 2016.
[25]	王从徐 . 基于泰勒级数展开及其应用探讨[J]. 红河学院学报, 2021,19(02): 154-156.
	WANG C X . Discussion on Taylor series expansion and its application[J]. Journal of Honghe University, 2021,19(02): 154-156.
[26]	HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task:data,models and evaluation metrics[J]. Journal of Artificial Intelligence Research, 2013,47: 853-899.
[27]	蔡鑫 . 基于 Bert 模型的互联网不良信息检测[J]. 电信科学, 2020,36(11): 121-126.
	CAI X . Internet bad information detection based on Bert model[J]. Telecommunications Science, 2020,36(11): 121-126.
[28]	LIN C Y , . Rouge:a package for automatic evaluation of summaries[C]// Text summarization branches out. Barcelona:ACL, 2004: 74-81.

Metrics

Recommended 0

No Suggested Reading articles found!

得分	0.91～1	0.71～0.9	0.51～0.7	0～0.5
ROUGE-1	0.62	0.28	0.08	0.02
ROUGE-2	0.59	0.12	0.22	0.07
ROUGE-L	0.62	0.28	0.08	0.02

模型	处理一张图片平均消耗的时间/s	处理一张图片平均消耗的电源容量/mAh
剪枝前的图像描述模型	4.049	0.164
剪枝后的图像描述模型	2.337	0.269

Offline visual aid system for the blind based on image captioning

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 28

Related Articles 15

Metrics

Recommended 0

[1]	Honghui JIN, Zhihua JIAN, Man YANG, Chao WU. Synthetic speech detection method using texture feature based on circumferential local ternary pattern [J]. Telecommunications Science, 2023, 39(6): 85-95.
[2]	Hui MA, Ruiqin WANG, Shuai YANG. A progressive growing of conditional generative adversarial networks model [J]. Telecommunications Science, 2023, 39(6): 105-113.
[3]	Min LU, Juan HU, Xianchao ZHANG, Weijian DING, Guangxue YUE. Personalized recommendation model based on users multi-features fusion [J]. Telecommunications Science, 2023, 39(5): 101-115.
[4]	Yong ZHANG, Jikui LIU, Wenlong KE. EEG emotion recognition based on parallel separable convolution and label smoothing regularization [J]. Telecommunications Science, 2023, 39(5): 116-128.
[5]	Kun DENG, Qingfeng JIANG, Xingyan LIU. Community detection algorithm of hybrid node analysis and edge analysis in complex networks [J]. Telecommunications Science, 2023, 39(4): 87-100.
[6]	Lijuan YE, Yiting WANG, Licheng ZHU. Cellular automata model based power network attack prediction technology [J]. Telecommunications Science, 2023, 39(4): 173-179.
[7]	Yishi HAN, Yuxin XU, Tiantian LU. A model of RD-IHSAT rumor dissemination based on coupling network [J]. Telecommunications Science, 2023, 39(2): 118-131.
[8]	Jia XU, Zhihua JIAN, Honghui JIN, Chao WU, Lin YOU, Yingxiao WU. Synthetic spoofing speech detection method based on center-symmetric local binary pattern [J]. Telecommunications Science, 2023, 39(1): 72-78.
[9]	Huajian REN, Xiulan HAO, Wenjing XU. Deep learning Chinese input method with incremental vocabulary selection [J]. Telecommunications Science, 2022, 38(12): 56-64.
[10]	Weina ZHOU, Lu LIU. A real-time detection method for multi-scale ships in complex scenes [J]. Telecommunications Science, 2022, 38(10): 67-78.
[11]	Nan JIN, Ruiqin WANG, Yuecong LU. Ebbinghaus forgetting curve and attention mechanism based recommendation algorithm [J]. Telecommunications Science, 2022, 38(10): 89-97.
[12]	Shuai YANG, Ruiqin WANG, Hui MA. Multi-channel based edge-learning graph convolutional network [J]. Telecommunications Science, 2022, 38(9): 95-104.
[13]	Dongming ZHAO. Research and application practice of knowledge graph technology system for telecom-operators [J]. Telecommunications Science, 2022, 38(8): 151-162.
[14]	Jiaqi YU, Zhihua JIAN, Jia XU, Lin YOU, Yunlu WANG, Chao WU. Spoofing speech detection algorithm based on joint feature and random forest [J]. Telecommunications Science, 2022, 38(6): 91-99.
[15]	Qing SHEN, Wenbin GUO, Jungang LOU, Qiangguo YU. Personalized recommendation model with multi-level latent features [J]. Telecommunications Science, 2022, 38(2): 71-83.