Chinese Journal of Network and Information Security ›› 2018, Vol. 4 ›› Issue (6): 1-10.doi: 10.11959/j.issn.2096-109x.2018048
• Comprehensive Review • Next Articles
Tuosiyu MING,Hongchang CHEN
Revised:
2018-06-01
Online:
2018-06-15
Published:
2018-08-08
Supported by:
CLC Number:
Tuosiyu MING, Hongchang CHEN. Research progress and trend of text summarization[J]. Chinese Journal of Network and Information Security, 2018, 4(6): 1-10.
"
方法 | 优点 | 缺点 |
基于统计学方法 | 依据文本形式上的规律,简单直观,避免考虑复杂的句法、语法结构,易于实现且应用广泛,无需训练数据,执行速度快 | 只是单纯利用了单词表层特征,没有充分挖掘词义关系和语义特征,存在较大局限性 |
基于外部语义资源方法 | 在统计学方法的基础上利用词间关系、词义关系进行了改进,使文本摘要的语义性能得到了一定的提高 | 受收录词汇的限制比较大;对于文章题目依赖程度较高;分词对关键词的影响较大;相似度阈值的选取对构建词汇链有影响。语法语义结构不连贯 |
基于图排序方法 | 适用于结构较为松散且涉及主题较多的结构;计算句子权重的同时可以充分考虑词汇之间、词组之间或句子之间的全局关系;无监督,语言独立,不需要对大量语料进行处理 | 通常只考虑了句子节点间的相似性关系,而忽略了文档篇章结构以及句子上下文的信息;相似度计算的好坏决定了关键词和句子重要性排序的正确与否;对数据的利用不够充分;没有考虑信息冗余 |
基于统计机器学习方法 | 特征选择和训练分类器的选择上有较大的可供选择范围,还可以综合一些开放性特征提高分类的精度 | 需要人工标注的数据集;效果严格依赖于训练数据质量的好坏;监督或半监督,执行速度较无监督的方法慢 |
基于深度学习方法 | 降低了对人工的依赖,可以高效地进行训练;可以与多种神经网络结构和Sequence-to-Sequence模型结合,生成文本摘要的可读性和准确度高 | 可解释性差;需要大量人工标注的数据集;由于有复杂的神经网络结构的引入,执行速度慢,需要花费相对较长的时间;对计算机性能有一定的要求 |
[1] | CHENG J , LAPATA M . Neural summarization by extracting sentences and words[J]. ar Xiv preprint ar Xiv:1603.07252, 2016. |
[2] | NEMA P , KHAPRA M , LAHA A ,et al. Diversity driven attention model for query-based abstractive summarization[J]. ar Xiv preprint ar Xiv:1704.08300, 2017. |
[3] | LI P , LAM W , BING L ,et al. Deep recurrent generative decoder for abstractive text summarization[J]. ar Xiv preprint ar Xiv:1708.00625, 2017. |
[4] | BING L , LI P , LIAO Y ,et al. Abstractive multi-document summarization via phrase selection and merging[J]. ar Xiv preprint ar Xiv:1506.01597, 2015. |
[5] | LI C , QIAN X , LIU Y . Using supervised bigram-based ilp for extractive summarization[C]// The 51st Annual Meeting of the Association for Computational Linguistics. 2013: 1004-1013. |
[6] | VEENA G , GUPTA D , JAGANADH J ,et al. A graph based conceptual mining model for abstractive text summarization[J]. Indian Journal of Science and Technology, 2016,9(S1). |
[7] | DANESH S , SUMNER T , MARTIN J H . Sgrank:combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction[C]// The Fourth Joint Conference on Lexical and Computational Semantics. 2015: 117-126. |
[8] | FLORESCU C , CARAGEA C . Position Rank:an unsupervised approach to keyphrase extraction from scholarly documents[C]// The 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115. |
[9] | LUHN H P . The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958,2(2): 159-165. |
[10] | BAXENDALE P B . Machine-made index for technical literature—an experiment[J]. IBM Journal of Research and Development, 1958,2(4): 354-361. |
[11] | EDMUNDSON H P . New methods in automatic extracting[J]. Journal of the ACM, 1969,16(2): 264-285. |
[12] | SALTON G , YU C T . On the construction of effective vocabularies for information retrieval[C]// ACM SIGIR Forum. 1973: 48-60. |
[13] | 施聪莺, 徐朝军, 杨晓江 . TFIDF 算法研究综述[J]. 计算机应用, 2009,29(B06): 167-170. |
SHI C Y , XU C J , YANG X J . Study of TFIDF algorithm[J]. Journal of Computer Applications, 2009,29(B06): 167-170. | |
[14] | 徐文海, 温有奎 . 一种基于 TFIDF 方法的中文关键词抽取算法[J]. 情报理论与实践, 2008,31(2): 298-302. |
XU W H , WEN Y K . A Chinese keyword extraction algorithm based on TFIDF method[J]. Information Studies:Theory & Application, 2008,31(2): 298-302. | |
[15] | SUQIN Z B S H M . An improved text feature weighting algorithm based on TFIDF[J]. Computer Applications and Software, 2011,2: 7. |
[16] | 李静月, 李培峰, 朱巧明 . 一种改进的 TFIDF 网页关键词提取方法[J]. 计算机应用与软件, 2011,28(5): 25-27. |
LI J Y , LI P F , ZHU Q M . An improved TFIDF-based approach to extract key words from Wed pages[J]. Computer Applications and Software, 2011,28(5): 25-27. | |
[17] | EL-BELTAGY S R , RAFEA A . Kp-miner:participation in semeval-2[C]// The 5th International Workshop on Semantic Evaluation, 2010: 190-193. |
[18] | DANESH S , SUMNER T , MARTIN J H . Sgrank:combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction[C]// The Fourth Joint Conference on Lexical and Computational Semantics. 2015: 117-126. |
[19] | FLORESCU C , CARAGEA C . Position Rank:an unsupervised approach to keyphrase extraction from scholarly documents[C]// The 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1105-1115. |
[20] | PADMALAHARI E , KUMAR D V N S , PRASAD S . Automatic text summarization with statistical and linguistic features using successive thresholds[C]// 2014 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT). 2014: 1519-1524. |
[21] | MILLER G A . Word Net:a lexical database for English[J]. Communications of the ACM, 1995,38(11): 39-41. |
[22] | BARZILAY R , ELHADAD M . Using lexical chains for text summarization[J]. Advances in Automatic Text Summarization, 1999: 111-121. |
[23] | JAIN A , GAUR A . Summarizing long historical documents using significance and utility calculation using Word Net[J]. Imperial Journal of Interdisciplinary Research, 2017,3(3). |
[24] | SILBER H G , MCCOY K F . Efficient text summarization using lexical chains[C]// The 5th International Conference on Intelligent user interfaces. ACM, 2000: 252-255. |
[25] | KOLLA M . Automatic text summarization using lexical chains:algorithms and experiments[D]. University of Lethbridge, 2004. |
[26] | POURVALI M , ABADEH M S . Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base[J]. ar Xiv preprint ar Xiv:1203.3586, 2012. |
[27] | HOU S , HUANG Y , FEI C ,et al. Holographic lexical chain and its application in chinese text summarization[C]// Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. 2017: 266-281. |
[28] | LYNN H M , CHOI C , KIM P . An improved method of automatic text summarization for Web contents using lexical chain with semantic-related terms[J]. Soft Computing, 2018,22(12): 4013-4023. |
[29] | PAL A R , SAHA D . An approach to automatic text summarization using Word Net[C]// IEEE International Conference on Advance Computing Conference (IACC). 2014: 1169-1173. |
[30] | PAGE L , BRIN S , MOTWANI R ,et al. The Page Rank citation ranking:bringing order to the Web[R]. Stanford Info Lab, 1999. |
[31] | KLEINBERG J M , KUMAR R , RAGHAVAN P ,et al. The Web as a graph:measurements,models,and methods[C]// International Computing and Combinatorics Conference, 1999: 1-17. |
[32] | MIHALCEA R , . Graph-based ranking algorithms for sentence extraction,applied to text summarization[C]// Proceedings of the ACL 2004 on Interactive Poster And Demonstration Sessions. Association for Computational Linguistics, 2004:20. |
[33] | WAN X , XIAO J . Single document keyphrase extraction using neighborhood knowledge[C]// AAAI. 2008,8: 855-860. |
[34] | GOLLAPALLI S D , CARAGEA C . Extracting keyphrases from research papers using citation networks[C]// AAAI. 2014: 1629-1635. |
[35] | KHAN A , SALIM N , FARMAN H ,et al. Abstractive text summarization based on improved semantic graph approach[J]. International Journal of Parallel Programming, 2018: 1-25. |
[36] | AL-KHASSAWNEH Y A , SALIM N , JARRAH M . Improving triangle-graph based text summarization using hybrid similarity function[J]. Indian Journal of Science and Technology, 2017,10(8). |
[37] | WEI F , LI W , LU Q ,et al. A document-sensitive graph model for multi-document summarization[J]. Knowledge and Information Systems, 2010,22(2): 245-259. |
[38] | GE S S , ZHANG Z , HE H . Weighted graph model based sentence clustering and ranking for document summarization[C]// 2011 4th International Conference on Interaction Sciences (ICIS). 2011: 90-95. |
[39] | NGUYEN-HOANG T A , NGUYEN K , TRAN Q V . TSGVi:a graph-based summarization system for Vietnamese documents[J]. Journal of Ambient Intelligence and Humanized Computing, 2012,3(4): 305-313. |
[40] | 耿焕同, 蔡庆生, 赵鹏 ,等. 一种基于词共现图的文档自动摘要研究[J]. 情报学报, 2005,24(6): 652-1. |
GENG H T , CAI Q S , ZHAO P ,et al. Research on document automatic summarization based on word co-occurrence[J]. Journal of the China Society for Scientific and Technical Information, 2005,24(6): 652-1. | |
[41] | SEHGAL S , KUMAR B , RAMPAL L ,et al. A modification to graph based approach for extraction based automatic text summarization[M]// Progress in Advanced Computing and Intelligent Engineering. Singapore Springer Press, 2018: 373-378. |
[42] | YOUSEFI-AZAR M , HAMEY L . Text summarization using unsupervised deep learning[J]. Expert Systems with Applications, 2017,68: 93-105. |
[43] | ARRAS L , HORN F , MONTAVON G ,et al. What is relevant in a text document? an interpretable machine learning approach[J]. PloSone, 2017,12(8):e0181142 |
[44] | THU H N T . An optimization text summarization method based on naive bayes and topic word for single syllable language[J]. Applied Mathematical Sciences, 2014,8(3): 99-115. |
[45] | SILVA G , FERREIRA R , LINS R D ,et al. Automatic text document summarization based on machine learning[C]// 2015 ACM Symposium on Document Engineering. ACM, 2015: 191-194. |
[46] | NISHIKAWA H , ARITA K , TANAKA K ,et al. Learning to generate coherent summary with discriminative hidden semi-markov model[C]// The 25th International Conference on Computational Linguistics:Technical Papers. 2014: 1648-1659. |
[47] | ALLAHYARI M , POURIYEH S , ASSEFI M ,et al. A brief survey of text mining:classification,clustering and extraction techniques[J]. ar Xiv preprint ar Xiv:1707.02919, 2017. |
[48] | KUPIEC J , PEDERSEN J , CHEN F . A trainable document summarizer[C]// The 18th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1995: 68-73. |
[49] | CONROY J M,O'LEARY D P , . Text summarization via hidden markov models[C]// The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 406-407. |
[50] | LIN C Y , . Training a selection function for extraction[C]// The Eighth International Conference on Information and Knowledge Management. ACM, 1999: 55-62. |
[51] | HINTON G E , OSINDERO S , TEH Y W . A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006,18(7): 1527-1554. |
[52] | MRK?I? N , VULI? I , SéAGHDHA D ó ,et al. Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints[J]. ar Xiv preprint ar Xiv:1706.00374, 2017. |
[53] | XIONG Z , SHEN Q , WANG Y ,et al. Paragraph vector representation based on word to vector and CNN learning[J]. CMC:Computers,Materials & Continua, 2018,55(2): 213-227. |
[54] | WANG X , ZHANG H , LIU Y . Sentence vector model based on implicit word vector expression[J]. IEEE Access, 2018,6: 17455-17463. |
[55] | SUTSKEVER I , VINYALS O , Le Q V . Sequence to sequence learning with neural networks[C]// Advances in neural information processing systems. 2014: 3104-3112. |
[56] | NALLAPATI R , XIANG B , ZHOU B . Sequence-to-sequence rnns for text summarization[J]. ar Xiv preprint ar Xiv:1602.06023v1, 2016. |
[57] | RUSH A M , CHOPRA S , WESTON J . A neural attention model for abstractive sentence summarization[J]. ar Xiv preprint ar Xiv:1509.00685, 2015. |
[58] | CHOPRA S , AULI M , RUSH A M . Abstractive sentence summarization with attentive recurrent neural networks[C]// The 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 93-98. |
[59] | NALLAPATI R , ZHOU B , GULCEHRE C ,et al. Abstractive text summarization using sequence-to-sequence RNNS and beyond[J]. ar Xiv preprint ar Xiv:1602.06023v5, 2016. |
[60] | CAO Z , LI W , LI S ,et al. Attsum:Joint learning of focusing and summarization with neural attention[J]. ar Xiv preprint ar Xiv:1604.00125, 2016. |
[61] | SEE A , LIU P J , Manning C D . Get to the point:summarization with pointer-generator networks[J]. ar Xiv preprint ar Xiv:1704.04368, 2017. |
[62] | ABADI M , BARHAM P , CHEN J ,et al. Tensor Flow:a system for large-scale machine learning[C]// OSDI. 2016: 265-283. |
[63] | SUTSKEVER I , VINYALS O , LE Q V . Sequence to sequence learning with neural networks[C]// Advances in Neural Information Processing Systems. 2014: 3104-3112. |
[64] | GEHRING J , AULI M , GRANGIER D ,et al. Convolutional sequence to sequence learning[J]. ar Xiv preprint ar Xiv:1705.03122, 2017. |
[65] | LIU L , LU Y , YANG M ,et al. Generative adversarial network for abstractive text summarization[J]. ar Xiv preprint ar Xiv:1711.09357, 2017. |
[66] | GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial nets[C]// Advances in Neural Information Processing Systems. 2014: 2672-2680. |
[67] | TAN J , WAN X , XIAO J . Abstractive document summarization with a graph-based attentional neural model[C]// The 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1171-1181. |
[68] | MIHALCEA R , TARAU P . Textrank:bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004. |
[69] | ERKAN G , RADEV D R . Lexrank:graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial Intelligence Research, 2004,22: 457-479. |
[1] | Ruiqi XIA, Manman LI, Shaozhen CHEN. Identification on the structures of block ciphers using machine learning [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 79-89. |
[2] | Xiaomeng LI, Daidou GUO, Xunfang ZHUO, Heng YAO, Chuan QIN. Carrier-independent screen-shooting resistant watermarking based on information overlay superimposition [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 135-149. |
[3] | Rongna XIE, Zhuhong MA, Zongyu LI, Ye TIAN. Encrypted traffic classification method based on convolutional neural network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 84-91. |
[4] | Dong LI, Yanni HAO, Shenghui PENG, Ruijie ZI, Ximeng LIU. Network security of the National Natural Science Foundation of China: today and prospects [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 92-101. |
[5] | Dengyong ZHANG, Huang WEN, Feng LI, Peng CAO, Lingyun XIANG, Gaobo YANG, Xiangling DING. Image inpainting forensics method based on dual branch network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 110-122. |
[6] | Jiaying LIN, Wenbo ZHOU, Weiming ZHANG, Nenghai YU. Lip forgery detection via spatial-frequency domain combination [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 146-155. |
[7] | Dibin SHAN, Xuehui DU, Wenjuan WANG, Aodi LIU, Na WANG. Access control relationship prediction method based on GNN dual source learning [J]. Chinese Journal of Network and Information Security, 2022, 8(5): 40-55. |
[8] | Chao MU, Xin WANG, Ming YANG, Heng ZHANG, Zhenya CHEN, Xiaoming WU. Hardcoded vulnerability detection approach for IoT device firmware [J]. Chinese Journal of Network and Information Security, 2022, 8(5): 98-110. |
[9] | Nan WEI, Lihua YIN, Hong NING, Binxing FANG. Preliminary study on the reform of machine learning teaching [J]. Chinese Journal of Network and Information Security, 2022, 8(4): 182-189. |
[10] | Jinyin CHEN, Changan WU, Haibin ZHENG. Novel defense based on softmax activation transformation [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 48-63. |
[11] | Baolin QIU, Ping YI. Adversarial examples defense method based on multi-dimensional feature maps knowledge distillation [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 88-99. |
[12] | Cheng HUANG, Mingxu SUN, Renyu DUAN, Susheng WU, Bin CHEN. Vulnerability identification technology research based on project version difference [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 52-62. |
[13] | Lijuan LI, Man LI, Hongjun BI, Huachun ZHOU. Multi-type low-rate DDoS attack detection method based on hybrid deep learning [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 73-85. |
[14] | Zhongyuan QIN, Zhaoxiang HE, Tao LI, Liquan CHEN. Adversarial example defense algorithm for MNIST based on image reconstruction [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 86-94. |
[15] | Zhensheng GAO, Lifeng CAO, Xuehui DU. Research progress of access control based on blockchain [J]. Chinese Journal of Network and Information Security, 2021, 7(6): 68-87. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|