基于多模态融合提升的文本分类方法

doi:10.11959/j.issn.2096-0271.2023067

Abstract

Abstract:

Although multimodal text classification techniques have potential when applied to specific scenarios, there are still some limitations.Existing multimodal fusion models require modal alignment in the input data, resulting in a large amount of incomplete multimodal data being directly discarded, thus limiting the scale and flexibility of available data for inference.To address this problem, we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods, our model had shown an improved performance of an average of 4.25% on a standard dataset.Furthermore, when the missing rate of other modalities except for text input was 50%, using the insufficient multimodal resource training method improved the performance by about 4% compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.

Key words: text classification, cross attention, multimodal fusion, insufficient multimodal resource training method

CLC Number:

TP183

Dezhi LIU, Liu HE, Youfeng LIU, Dechun HAN. A text classification method based on multimodal fusion enhancement[J]. Big Data Research, 2024, 10(2): 80-93.

Figures/Tables 16

References 27

[1]	ZADEH A , ZELLERS R , PINCUS E ,et al. Mosi:multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB]. arXiv preprint, 2016,arXiv:1606.06259.
[2]	PORIA S , CAMBRIA E , HAZARIKA D ,et al. Multi-level multiple attentions for contextual multimodal sentiment analysis[C]// Proceedings of 2017 IEEE International Conference on Data Mining (ICDM). Piscataway:IEEE Press, 2017: 1033-1038.
[3]	GUO W , WANG J , WANG S . Deep multimodal representation learning:a survey[J]. IEEE Access, 2019,7: 63373-63394.
[4]	CAMBRIA E , HAZARIKA D , PORIA S ,et al. Benchmarking multimodal sentiment analysis[M]. Computational linguistics and intelligent text processing. Cham: Springer, 2018: 166-179.
[5]	ZADEH A , CHEN M H , PORIA S ,et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2017: 1103-1114.
[6]	ZADEH A , LIANG P P , MAZUMDER N ,et al. Memory fusion network for multiview sequential learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence.[S.l.:s.n.], 2018.
[7]	DEVLIN J , CHANG M W , LEE K ,et al. Bert:pre-training of deep bidirectional transformers for language understanding[EB]. arXiv preprint 2018,arXiv:1810.04805.
[8]	SUN Y , WANG S , LI Y ,et al. Ernie:enhanced representation through knowledge integration[EB]. arXiv preprint, 2019,arXiv:1904.09223.
[9]	CUI Y M , CHE W X , LIU T ,et al. Revisiting pre-trained models for Chinese natural language processing[C]// Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020. Stroudsburg:Association for Computational Linguistics, 2020: 657-668.
[10]	LIU Y , OTT M , GOYAL N ,et al. Roberta:a robustly optimized bert pretraining approach[EB]. arXiv preprint, 2019,arXiv:1907.11692.
[11]	SENNRICH R , HADDOW B , BIRCH A . Neural machine translation of rare words with subword units[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2016: 1715-1725.
[12]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[13]	PETERS M E , NEUMANN M , IYYER M ,et al. Deep contextualized word representations[EB]. arXiv preprint, 2018,arXiv:1802.05365.
[14]	RADFORD A , NARASIMHAN K , SALIMANS T ,et al. Improving language understanding by generative pretraining[Z]. OpenAI, 2018.
[15]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:ACM, 2017: 6000-6010.
[16]	IOFFE S , SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York:ACM, 2015: 448-456.
[17]	HENDRYCKS D , GIMPEL K . Gaussian error linear units (GELUs)[EB]. arXivpreprint, 2016,arXiv:1606.08415.
[18]	QI P , CAO J , YANG T Y ,et al. Exploiting multi-domain visual information for fake news detection[C]// Proceedings of 2019 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2020: 518-527.
[19]	JIN Z W , CAO J , GUO H ,et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]// Proceedings of the 25th ACM international conference on Multimedia. New York:ACM, 2017: 795-816.
[20]	BOIDIDOU C , PAPADOPOULOS S , KOMPATSIARIS Y ,et al. Challenges of computational verification in social multimedia[C]// Proceedings of the 23rd International Conference on World Wide Web. New York:ACM, 2014: 743-748.
[21]	ANTOL S , AGRAWAL A , LU J ,et al. Vqa:visual question answering[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2015: 2425-2433.
[22]	VINYALS O , TOSHEV A , BENGIO S ,et al. Show and tell:a neural image caption generator[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Piscataway:IEEE Press, 2015: 3156-3164.
[23]	WANG Y Q , MA F L , JIN Z W ,et al. EANN:event adversarial neural networks for multi-modal fake news detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM, 2018: 849-857.
[24]	JINSHUO L , KUO F , PAN J Z ,et al. MSRD:multi-modal web rumor detection method[J]. Journal of Computer Research and Development, 2020,57(11): 2328-2336.
[25]	JIANA M , XIAOPEI W , TING L ,et al. Cross-modal rumor detection based on adversarial neural network[J]. Data Analysis and Knowledge Discovery, 2023,6(12): 32-42.
[26]	MIYATO T , DAI A M , GOODFELLOW I ,et al. Adversarial training methods for semisupervised text classification[EB]. arXiv preprint, 2016,arXiv:605.07725.
[27]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[EB]. arXiv preprint, 2017,arXiv:1706.06083.

Metrics

Recommended 0

No Suggested Reading articles found!

数据集	训练集条目/条	测试集条目/条	平均长度
Weibo	7 481	1 917	110.8
Twitter	9 307	1 387	79.3

方法	准确率	真实内容			谣言内容
方法	准确率	精确率	召回率	宏F1	精确率	召回率	宏F1
Textual	0.592	0.605	0.531	0.566	0.581	0.653	0.615
Visual	0.608	0.61	0.605	0.607	0.607	0.611	0.609
Socal Content	0.65	0.672	0.591	0.629	0.634	0.71	0.67
Early Fusion	0.603	0.612	0.567	0.589	0.595	0.639	0.616
Late Fushion	0.669	0.693	0.611	0.649	0.651	0.728	0.687
VQA^[21]	0.736	0.797	0.634	0.706	0.695	0.838	0.76
NeuralTalk^[22]	0.726	0.794	0.613	0.692	0.684	0.84	0.754
att-RNN^[19]	0.772	0.854	0.656	0.742	0.72	0.889	0.795
EANN^[23]	0.782	0.827	0.697	0.756	0.752	0.863	0.804
MSRD^[24]	0.794	0.854	0.716	0.779	—	—	—
DCNN^[25]	0.803	0.799	0.801	0.809	—	—	—
MBN-ot	0.803	0.894	0.666	0.763	0.753	0.928	0.832
MBN-to	0.823	0.887	0.721	0.795	0.783	0.916	0.844

方法	准确率	真实内容			谣言内容
方法	准确率	精确率	召回率	宏F1	准确率	召回率	宏F1
Textual	0.532	0.598	0.541	0.568	0.462	0.52	0.489
Visual	0.596	0.695	0.518	0.593	0.524	0.7	0.599
Socal Content	0.509	0.566	0.589	0.577	0.426	0.403	0.414
Early Fusion	0.619	0.727	0.528	0.612	0.542	0.738	0.625
Late Fushion	0.594	0.661	0.589	0.623	0.526	0.602	0.561
VQA^[21]	0.631	0.765	0.509	0.611	0.55	0.794	0.65
NeuralTalk^[22]	0.61	0.728	0.504	0.595	0.534	0.752	0.625
att-RNN^[19]	0.664	0.749	0.615	0.676	0.589	0.728	0.651
EANN^[23]	0.648	0.810	0.498	0.617	0.584	0.759	0.66
MSRD^[24]	0.685	0.725	0.636	0.678	—	—	—
DCNN^[25]	—	—	—	—	—	—	—
MBN-ot	0.720	0.830	0.794	0.812	0.427	0.486	0.455
MBN-to	0.750	0.837	0.825	0.831	0.507	0.528	0.517

方法(MBN-to+)	Weibo		Twitter
方法(MBN-to+)	准确率	宏F1	准确率	宏F1
Base(bert)	0.823	0.818	0.750	0.674
ERNIE	0.830	0.826	0.782	0.680
Bert-wwm	0.827	0.824	—	—
RoBERTa-wwm	0.841	0.838	—	—

方法(MBN-to+Bestpretrain)	Weibo		Twitter
方法(MBN-to+Bestpretrain)	准确率	宏F1	准确率	宏F1
Base(no adv)	0.841	0.838	0.782	0.680
FGM	0.847	0.844	0.807	0.731
PGD	0.841	0.838	0.808	0.694

A text classification method based on multimodal fusion enhancement

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 27

Related Articles 3

Metrics

Recommended 0

方法(MBN+)	Weibo		Twitter
方法(MBN+)	准确率	宏F1	准确率	宏F1
S1 Multi-router	0.800	0.795	0.711	0.634
S2 padding	0.820	0.816	0.748	0.680
S2 flag	0.827	0.823	0.764	0.646
S2 flag_mask	0.829	0.827	0.766	0.676
S2 weight_fusion	0.824	0.821	0.742	0.648

[1]	Xiaolong ZHANG, Long ZHI, Jian GAO, Zhongchen MIAO, Yuefeng LIN, Yali XIANG, Yun XIONG. A semi-supervised learning financial news classification algorithm [J]. Big Data Research, 2022, 8(2): 134-144.
[2]	Qian SUN, Yongbin QIN, Ruizhang HUANG, Lijuan LIU, Yanping CHEN. Charge prediction method combined with case elements sequence [J]. Big Data Research, 2021, 7(6): 30-40.
[3]	Yifeng WANG, Liru SUN, Liangle CUI, Yi ZHAO. Adaptive feature spectrum neural networks for special types of natural language classification [J]. Big Data Research, 2020, 6(4): 92-104.

数据集	Weibo		Twitter
数据集	训练集	测试集	训练集	测试集
Complete	1880	491	2327	335
No Img	1829	468	2323	359
No Craft	1829	468	2323	359
Only txt	1880	490	2326	334