Big Data Research ›› 2024, Vol. 10 ›› Issue (2): 80-93.doi: 10.11959/j.issn.2096-0271.2023067

• STUDY • Previous Articles    

A text classification method based on multimodal fusion enhancement

Dezhi LIU1,2, Liu HE3, Youfeng LIU1,2, Dechun HAN2   

  1. 1 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China
    3 AVIC China Aero-Polytechnology Establishment, Beijing 100028, China
  • Online:2024-03-01 Published:2024-03-01

Abstract:

Although multimodal text classification techniques have potential when applied to specific scenarios, there are still some limitations.Existing multimodal fusion models require modal alignment in the input data, resulting in a large amount of incomplete multimodal data being directly discarded, thus limiting the scale and flexibility of available data for inference.To address this problem, we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods, our model had shown an improved performance of an average of 4.25% on a standard dataset.Furthermore, when the missing rate of other modalities except for text input was 50%, using the insufficient multimodal resource training method improved the performance by about 4% compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.

Key words: text classification, cross attention, multimodal fusion, insufficient multimodal resource training method

CLC Number: 

No Suggested Reading articles found!