大数据 ›› 2024, Vol. 10 ›› Issue (2): 80-93.doi: 10.11959/j.issn.2096-0271.2023067

• 研究 • 上一篇    

基于多模态融合提升的文本分类方法

刘德志1,2, 何柳3, 刘幼峰1,2, 韩德纯2   

  1. 1 北京航空航天大学计算机学院,北京 100191
    2 北京航空航天大学大数据与脑机智能高精尖创新中心,北京 100191
    3 中国航空综合技术研究所,北京 100028
  • 出版日期:2024-03-01 发布日期:2024-03-01
  • 作者简介:刘德志(1996- ),男,北京航空航天大学计算机机学院博士生,主要研究方向为知识消歧、信息抽取。
    何柳(1988- ),男,中国航空综合技术研究所高级工程师,主要研究方向为人工智能、计算机视觉、多模态机器学习。
    刘幼峰(1996- ),男,北京航空航天大学计算机学院硕士生,主要研究方向为多模态融合、知识图谱。
    韩德纯(1980- ),男,北京航空航天大学大数据与脑机智能高精尖创新中心首席架构师,主要研究方向为大数据应用、网络安全系统。

A text classification method based on multimodal fusion enhancement

Dezhi LIU1,2, Liu HE3, Youfeng LIU1,2, Dechun HAN2   

  1. 1 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China
    3 AVIC China Aero-Polytechnology Establishment, Beijing 100028, China
  • Online:2024-03-01 Published:2024-03-01

摘要:

尽管基于多模态的文本分类技术在应用到具体场景中具有潜力,但仍存在局限性。现有多模态融合模型要求输入数据模态对齐,因此大量不完整的多模态数据被直接浪费,从而限制了推理时可用数据的规模和灵活性。为了解决这个问题,提出了一种基于多模态融合提升的文本分类模型和不充分多模态资源训练方法。与传统方法相比,提出的模型在标准数据集上的性能平均提高了约4.25%。此外,在除文本输入模态外的其他模态缺失率为50%的情况下,不充分多模态资源训练方法的性能比传统多路由策略提高了约4%。这表明所提出的模型和训练方法具有明显的优势和有效性。

关键词: 文本分类, 交叉注意力, 多模态融合, 不充分多模态资源训练方法

Abstract:

Although multimodal text classification techniques have potential when applied to specific scenarios, there are still some limitations.Existing multimodal fusion models require modal alignment in the input data, resulting in a large amount of incomplete multimodal data being directly discarded, thus limiting the scale and flexibility of available data for inference.To address this problem, we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods, our model had shown an improved performance of an average of 4.25% on a standard dataset.Furthermore, when the missing rate of other modalities except for text input was 50%, using the insufficient multimodal resource training method improved the performance by about 4% compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.

Key words: text classification, cross attention, multimodal fusion, insufficient multimodal resource training method

中图分类号: 

No Suggested Reading articles found!