基于深度学习的文本分类研究进展

doi:10.11959/j.issn.2096-109x.2020010

摘要/Abstract

摘要：

文本分类技术是自然语言处理领域的研究热点，其主要应用于舆情检测、新闻文本分类等领域。近年来，人工神经网络技术在自然语言处理的许多任务中有着很好的表现，将神经网络技术应用于文本分类取得了许多成果。在基于深度学习的文本分类领域，文本分类的数值化表示技术和基于深度学习的文本分类技术是两个重要的研究方向。对目前文本表示的有关词向量的重要技术和应用于文本分类的深度学习方法的实现原理和研究现状进行了系统的分析和总结，并针对当前的技术发展，分析了文本分类方法的不足和发展趋势。

关键词: 文本分类, 深度学习, 人工神经网络, 词向量

Abstract:

Text classification is a research hot spot in the field of natural language processing,which is mainly used in public opinion detection,news classification and other fields.In recent years,artificial neural networks has good performance in many tasks of natural language processing,the application of neural network technology to text classification has also made many achievements.In the field of text classification based on deep learning,numerical representation of text and deep-learning-based text classification are two main research directions.The important technology of word embedding in text representation and the implementation principle and research status of deep learning method applied in text classification were systematically analyzed and summarized.And the shortcomings and the development trend of text classification methods in view of the current technology development were analyzed.

Key words: text classification, deep learning, artificial neural network, word embedding

中图分类号:

TP393

杜思佳,于海宁,张宏莉. 基于深度学习的文本分类研究进展[J]. 网络与信息安全学报, 2020, 6(4): 1-13.

Sijia DU,Haining YU,Hongli ZHANG. Survey of text classification methods based on deep learning[J]. Chinese Journal of Network and Information Security, 2020, 6(4): 1-13.

图/表 5

参考文献 60

[1]	于游, 付钰, 吴晓平 . 中文文本分类方法综述[J]. 网络与信息安全学报, 2019,5(5): 1-8.
	YU Y , FU Y , WU X P . Summary of text classification methods[J]. Chinese Journal of Network and Information Security, 2019,5(5): 1-8.
[2]	明拓思宇, 陈鸿昶 . 文本摘要研究进展与趋势[J]. 网络与信息安全学报, 2018,4(6): 1-10.
	MING T S Y , CHEN H C . Research progress and trend of text summarization[J]. Chinese Journal of Network and Information Security, 2018,4(6): 1-10.
[3]	HINTON G E , . Learning distributed representations of concepts[C]// Proceedings of the Eighth Annual Conference of the Cognitive Science Society. 1986:12.
[4]	BENGIO Y , DUCHARME R , VINCENT P ,et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003,3(2): 1137-1155.
[5]	COLLOBERT R , WESTON J , BOTTOU L ,et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011,12(8): 2493-2537.
[6]	MNIH A , HINTON G . Three new graphical models for statistical language modelling[C]// Proceedings of the 24th International Conference on Machine Learning. 2007: 641-648.
[7]	MNIH A , HINTON G E . A scalable hierarchical distributed language model[C]// Advances in Neural Information Processing Systems. 2009: 1081-1088.
[8]	TOMá? M . Statistical language models based on neural networks[D]. Brno:Brno University of Technology, 2012.
[9]	MIKOLOV T , CHEN K , CORRADO G ,et al. Estimation of word representations in vector space[C]// International Conference on Learning Representations. 2013.
[10]	MIKOLOV T , SUTSKEVER I , CHEN K ,et al. Distributed representations of words and phrases and their compositionality[C]// Advances in Neural Information Processing Systems. 2013: 3111-3119.
[11]	MNIH A , KAVUKCUOGLU K . Learning word embeddings efficiently with noise-contrastive estimation[C]// Advances in Neural Information Processing Systems. 2013: 2265-2273.
[12]	PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
[13]	BOJANOWSKI P , GRAVE E , JOULIN A ,et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017,5: 135-146.
[14]	PETERS M E , NEUMANN M , IYYER M ,et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365, 2018
[15]	RADFORD A , NARASIMHAN K , SALIMANS T ,et al. Improving language understanding by generative pre-training[R]. 2018.
[16]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Advances in Neural Information Processing Systems. 2017: 5998-6008.
[17]	DEVLIN J , CHANG M W , LEE K ,et al. Bert:pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018
[18]	JOULIN A , GRAVE E , BOJANOWSKI P ,et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv:1607.01759, 2016
[19]	CAO S , LU W , ZHOU J ,et al. cw2vec:learning chinese word embeddings with stroke n-gram information[C]// Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[20]	LI Y , LI W , SUN F ,et al. Component-enhanced Chinese character embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 829-834.
[21]	SUN Y , LIN L , YANG N ,et al. Radical-enhanced chinese character embedding[C]// International Conference on Neural Information Processing. Springer,Cham, 2014: 279-286.
[22]	CHEN X , XU L , LIU Z ,et al. Joint learning of character and wordembeddings[C]// Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
[23]	YU J , JIAN X , XIN H ,et al. Joint embeddings of chinese words,characters,and fine-grained subcharacter components[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 286-291.
[24]	REI M , S?GAARD A , . Jointly learning to label sentences and tokens[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019,33: 6916-6923.
[25]	LECUN Y , BOSER B , DENKER J S ,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989,1(4): 541-551.
[26]	HU B , LU Z , LI H ,et al. Convolutional neural network architectures for matching natural language sentences[C]// Advances in Neural Information Processing Systems. 2014: 2042-2050.
[27]	KIM Y , . Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1746-1751.
[28]	ZHANG Y , WALLACE B . A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification[J]. arXiv preprint arXiv:1510.03820, 2015
[29]	WANG P , XU J , XU B ,et al. Semantic clustering and convolutional neural network for short text categorization[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 352-357.
[30]	KALCHBRENNER N , GREFENSTETTE E , BLUNSOM P . A convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014
[31]	CONNEAU A , SCHWENK H , BARRAULT L ,et al. Very deep convolutional networks for text classification[J]. arXiv preprint arXiv:1606.01781, 2016
[32]	JOHNSON R , ZHANG T . Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.
[33]	ZHANG X , ZHAO J , LECUN Y . Character-level convolutional networks for text classification[C]// Advances in Neural Information Processing Systems. 2015: 649-657.
[34]	JOHNSON R , ZHANG T . Effective use of word order for text categorization with convolutional neural networks[J]. arXiv preprint arXiv:1412.1058, 2014
[35]	JOHNSON R , ZHANG T . Semi-supervised convolutional neural networks for text categorization via region embedding[C]// Advances in Neural Information Processing Systems. 2015: 919-927.
[36]	JINDAL I , PRESSEL D , LESTER B ,et al. An effective label noise model for dnn text classification[J]. arXiv preprint arXiv:1903.07507, 2019
[37]	KIPF T N , WELLING M . Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016
[38]	HENAFF M , BRUNA J , LE CUN Y . Deep convolutional networks on graph-structured data[J]. arXiv preprint arXiv:1506.05163, 2015
[39]	DEFFERRARD M , BRESSON X , VANDERGHEYNST P . Convolutional neural networks on graphs with fast localized spectral filtering[C]// Advances in Neural Information Processing Systems. 2016: 3844-3852.
[40]	PENG H , LI J , HE Y ,et al. Large-scale hierarchical text classification with recursively regularized deep graph-CNN[C]// Proceedings of the 2018 World Wide Web Conference. 2018: 1063-1072.
[41]	YAO L , MAO C , LUO Y . Graph convolutional networks for text classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019,33: 7370-7377.
[42]	CHO K , VAN MERRIENBOER B , GULCEHRE C ,et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014
[43]	BENGIO Y , SIMARD P , FRASCONI P . Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 1994,5(2): 157-166.
[44]	HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
[45]	HOCHREITER S , BENGIO Y , FRASCONI P ,et al. Gradient flow in recurrent nets:the difficulty of learning long-term dependencies[M]// A Field Guide to Dynamical Recurrent Networks, 2001: 237-243.
[46]	GERS F A , SCHMIDHUBER J , CUMMINS F . Learning to forget:continual prediction with LSTM[C]// International Conference on Artificial Neural Networks. 2002.
[47]	CHUNG J , GULCEHRE C , CHO K ,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv:1412.3555, 2014
[48]	MNIH V , HEESS N , GRAVES A ,et al. Recurrent models of visual Attention[C]// Advances in Neural Information Processing Systems. 2014.
[49]	BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014
[50]	LUONG M T , PHAM H , MANNING C D . Effective approaches to attention-based neural machine translation[J]. arXiv preprint arXiv:1508.04025, 2015
[51]	YANG Z , YANG D , DYER C ,et al. Hierarchical attention networks for document classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 1480-1489.
[52]	POLLACK J B . Recursive distributed representations[J]. Artificial Intelligence, 1990,46(1): 77-105.
[53]	SOCHER R , PERELYGIN A , WU J Y ,et al. Recursive deep models for semantic compositionality over a sentiment Treebank[C]// Proc.EMNLP. 2013.
[54]	TAI K S , SOCHER R , MANNING C D . Improved semantic representations from tree-structured long short-term memory networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 1556-1566.
[55]	LAI S , XU L , LIU K ,et al. Recurrent convolutional neural networks for text classification[C]// Twenty-ninth AAAI Conference on Artificial Intelligence. 2015.
[56]	ZHOU C , SUN C , LIU Z ,et al. A C-LSTM neural network for text classification[J]. arXiv preprint arXiv:1511.08630, 2015
[57]	TANG D , QIN B , LIU T . Document modeling with gated recurrent neural network for sentiment classification[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1422-1432.
[58]	XU C , HUANG W , WANG H ,et al. Modeling local dependence in natural language with multi-channel recurrent neural networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 201933: 5525-5532.
[59]	IYYER M , MANJUNATHA V , BOYD-GRABER J ,et al. Deep unordered composition rivals syntactic methods for text classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 20151: 1681-1691.
[60]	ADHIKARI A , RAM A , TANG R ,et al. Rethinking complex neural network architectures for document classification[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4046-4051.