电信科学 ›› 2021, Vol. 37 ›› Issue (2): 82-98.doi: 10.11959/j.issn.1000-0801.2021031

• 研究与开发 • 上一篇    下一篇

基于自编码器与多模态数据融合的视频推荐方法

顾秋阳1, 琚春华2, 吴功兴2   

  1. 1 浙江工业大学管理学院,浙江 杭州 310023
    2 浙江工商大学,浙江 杭州 310018
  • 修回日期:2021-01-30 出版日期:2021-02-20 发布日期:2021-02-01
  • 作者简介:顾秋阳(1995- ),男,浙江工商大学博士生,主要研究方向为智能信息处理、数据挖掘、电子商务与物流优化等。
    琚春华(1962- ),男,博士,浙江工商大学教授、博士生导师,主要研究方向为智能信息处理、数据挖掘、电子商务与物流优化等。
    吴功兴(1974- ),男,博士,浙江工商大学副教授,主要研究方向为智能信息处理、数据挖掘、电子商务与物流优化等。
  • 基金资助:
    国家自然科学基金资助项目(71571162);浙江省社会科学规划重点课题项目(20NDJC10Z);国家社会科学基金应急管理体系建设研究专项(20VYJ073);浙江省哲学社会科学重大课题项目(20YSXK02ZD)

Fusion of auto encoders and multi-modal data based video recommendation method

Qiuyang GU1, Chunhua JU2, Gongxing WU2   

  1. 1 Zhejiang University of Technology, School of Management, Hangzhou 310023, China
    2 Zhejiang Gongshang University, Hangzhou 310018, China
  • Revised:2021-01-30 Online:2021-02-20 Published:2021-02-01
  • Supported by:
    The National Natural Science Foundation of China(71571162);The Social Science Planning Key Project of Zhejiang Province(20NDJC10Z);The National Social Science Fund Emergency Management System Construction Research Projec(20VYJ073);Zhejiang Philosophy and Social Science Major Project(20YSXK02ZD)

摘要:

现今常用的线性结构视频推荐方法存在推荐结果非个性化、精度低等问题,故开发高精度的个性化视频推荐方法迫在眉睫。提出了一种基于自编码器与多模态数据融合的视频推荐方法,对文本和视觉两种数据模态进行视频推荐。具体来说,所提方法首先使用词袋和TF-IDF方法描述文本数据,然后将所得特征与从视觉数据中提取的深层卷积描述符进行融合,使每个视频文档都获得一个多模态描述符,并利用自编码器构造低维稀疏表示。本文使用 3 个真实数据集对所提模型进行了实验,结果表明,与单模态推荐方法相比,所提方法推荐性能明显提升,且所提视频推荐方法的性能优于基准方法。

关键词: 自编码器, 多模态表示, 数据融合, 视频推荐

Abstract:

Nowadays, the commonly used linear structure video recommendation methods have the problems of non-personalized recommendation results and low accuracy, so it is extremely urgent to develop high-precision personalized video recommendation method.A video recommendation method based on the fusion of autoencoders and multi-modal data was presented.This method fused two data including text and vision for video recommendation.To be specific, the method proposed firstly used bag of words and TF-IDF methods to describe text data, and then fused the obtained features with deep convolutional descriptors extracted from visual data, so that each video document could get a multi-modal descriptors, and constructed low-dimensional sparse representation by autoencoders.Experiments were performed on the proposed model by using three real data sets.The result shows that compared with the single-modal recommendation method, the recommendation results of the proposed method are significantly improved, and the performance is better than the reference method.

Key words: autoencoder, multi-modal representation, data fusion, video recommendation

中图分类号: 

No Suggested Reading articles found!