通信学报 ›› 2019, Vol. 40 ›› Issue (10): 189-198.doi: 10.11959/j.issn.1000-436x.2019194

• 学术通信 • 上一篇    

时空压缩激励残差乘法网络的视频动作识别

罗会兰, 童康   

  1. 江西理工大学信息工程学院,江西 赣州 341000
  • 修回日期:2019-07-17 出版日期:2019-10-25 发布日期:2019-11-07
  • 作者简介:罗会兰(1974- ),女,江西上高人,博士,江西理工大学教授,主要研究方向为计算机视觉、模式识别。|童康(1992- ),男,江苏南京人,江西理工大学硕士生,主要研究方向为计算机视觉、视频动作识别。
  • 基金资助:
    国家自然科学基金资助项目(61862031);江西省自然科学基金资助项目(20171BAB202014);江西省赣州市“科技创新人才计划”基金资助项目

Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition

Huilan LUO, Kang TONG   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China
  • Revised:2019-07-17 Online:2019-10-25 Published:2019-11-07
  • Supported by:
    The National Natural Science Foundation of China(61862031);Jiangxi Natural Science Foundation(20171BAB202014);“Science and Technology Innovation Talent Plan” Project of Ganzhou,Jiangxi Province

摘要:

针对双流网络结构中浅层网络和一般深度模型学习空间信息和时间信息的不足,提出将压缩激励残差网络用于空间流和时间流的动作识别,同时将恒等映射核作为时间滤波器注入网络中捕获长期时间依赖性。为了进一步加强压缩激励残差网络的空间信息和时间信息之间的交互,采用时空特征相乘融合,并研究空间流和时间流乘法融合方式、次数以及位置对识别性能的影响。鉴于单个模型获得性能的局限性,提出了3种不同的策略生成多个模型,并使用直接平均与加权平均集成以得到最终识别结果。HMDB51和UCF101数据集上的实验结果表明,所提时空压缩激励残差乘法网络能够有效提升动作识别性能。

关键词: 动作识别, 时空流, 压缩激励残差网络, 相乘融合, 多模型集成

Abstract:

Aiming at the shortcomings of shallow networks and general deep models in two-stream network structure,which could not effectively learn spatial and temporal information,a squeeze-and-excitation residual network was proposed for action recognition with a spatial stream and a temporal stream.Meanwhile,the long-term temporal dependence was captured by injecting the identity mapping kernel into the network as a temporal filter.Spatiotemporal feature multiplication fusion was used to further enhance the interaction between spatial information and temporal information of squeeze-and-excitation residual networks.Simultaneously,the influence of spatial-temporal stream multiplication fusion methods,times and locations on the performance of action recognition was studied.Given the limitations of performance achieved by a single model,three different strategies were proposed to generate multiple models,and the final recognition result was obtained by integrating these models through averaging and weighted averaging.The experimental results on the HMDB51 and UCF101 datasets show that the proposed spatiotemporal squeeze-and-excitation residual multiplier networks can effectively improve the performance of action recognition.

Key words: action recognition, spatiotemporal stream, squeeze-and-excitation residual network, multiplication fusion, multi-model ensemble

中图分类号: 

No Suggested Reading articles found!