通信学报 ›› 2020, Vol. 41 ›› Issue (2): 36-43.doi: 10.11959/j.issn.1000-436x.2020037

• 专题:智慧矿山 • 上一篇    下一篇

基于多维度和多模态信息的视频描述方法

丁恩杰1,刘忠育1(),刘亚峰1,郁万里2   

  1. 1 中国矿业大学物联网(感知矿山)研究中心,江苏 徐州 221008
    2 不来德大学电动学与微电子研究所,不来德 28359
  • 修回日期:2020-01-14 出版日期:2020-02-25 发布日期:2020-03-09
  • 作者简介:丁恩杰(1962- ),男,山东青岛人,博士,中国矿业大学教授,主要研究方向为工业物联网、模式识别、人员定位等|刘忠育(1985- ),男,河南辉县人,中国矿业大学博士生,主要研究方向为计算机视觉、自然语言处理等|刘亚峰(1985- ),男,江苏徐州人,博士,中国矿业大学助理研究员,主要研究方向为机器学习、计算机视觉、行为识别等|郁万里(1987- ),男,江苏徐州人,博士,不来梅大学在站博士后,主要研究方向为工业物联网、网络优化、移动边缘计算等
  • 基金资助:
    国家重点研发计划基金资助项目(2017YFC0804400);国家重点研发计划基金资助项目(2017YFC0804401)

Video description method based on multidimensional and multimodal information

Enjie DING1,Zhongyu LIU1(),Yafeng LIU1,Wanli YU2   

  1. 1 IoT/Perception Mine Research Center,China University of Mining &Technology,Xuzhou 221008,China
    2 Institute of Electrodynamics and Microelectronics,University of Bremen,Bremen 28359,Germany
  • Revised:2020-01-14 Online:2020-02-25 Published:2020-03-09
  • Supported by:
    The National Key Research and Development Program of China(2017YFC0804400);The National Key Research and Development Program of China(2017YFC0804401)

摘要:

针对视频自动描述任务中的复杂信息表征问题,提出一种多维度和多模态视觉特征的提取和融合方法。首先通过迁移学习提取视频序列的静态和动态等多维度特征,并采用图像描述算法提取视频关键帧的语义信息,完成视频信息的特征表征;然后采用多层长短期记忆网络融合多维度和多模态信息,最终生成视频内容的语言描述。实验仿真表明,所提方法与目前已有方法相比,在视频自动描述任务中取得了较好的效果。

关键词: 视频描述, 多模态, 迁移学习, 长短期记忆网络, 循环神经网络

Abstract:

In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.

Key words: video description, multimodal, transfer learning, long and short term memory network

中图分类号: 

No Suggested Reading articles found!