通信学报 ›› 2017, Vol. 38 ›› Issue (4): 86-98.doi: 10.11959/j.issn.1000-436x.2017088

• 学术论文 • 上一篇    下一篇

基于神经网络的文本表示模型新方法

曾谁飞1,张笑燕1,杜晓峰2,陆天波1   

  1. 1 北京邮电大学软件学院,北京 100876
    2 北京邮电大学计算机学院,北京 100876
  • 修回日期:2017-03-09 出版日期:2017-04-01 发布日期:2017-07-20
  • 作者简介:曾谁飞(1978-),男,江西广昌人,北京邮电大学博士生,主要研究方向为智能信息处理、机器学习、深度学习和神经网络等。|张笑燕(1973-),女,山东烟台人,博士,北京邮电大学教授,主要研究方向为软件工程理论、移动互联网软件、ad hoc和无线传感器网络。|杜晓峰(1973-),男,陕西韩城人,北京邮电大学讲师,主要研究方向为云计算与大数据分析。|陆天波(1977-),男,贵州毕节人,博士,北京邮电大学副教授,主要研究方向为网络与信息安全、安全软件工程、P2P计算。

New method of text representation model based on neural network

Shui-fei ZENG1,Xiao-yan ZHANG1,Xiao-feng DU2,Tian-bo LU1   

  1. 1 School of Software Engineering,Beijing University of Posts and Telecommunications,Beijing 100876,China
    2 School of Computer,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Revised:2017-03-09 Online:2017-04-01 Published:2017-07-20

摘要:

提出了一种改进的文本表示模型提取文本特征词向量方法。首先构建基于词典索引和所对应的词性索引的double word-embedding列表的word-embedding词向量,其次,利用在此基础上Bi-LSTM循环神经网络对生成后的词向量进一步进行特征提取,最后,通过mean-pooling层处理句子向量后且使用了softmax层进行文本分类。实验验证了Bi-LSTM和double word-embedding神经网络相结合的模型训练效果与提取情况。实验结果表明,该模型不但能较好地处理高质量的文本特征向量提取和表达序列,而且比LSTM、LSTM+context window和Bi-LSTM这3种神经网络有较明显的表达效果。

关键词: 神经网络, 词向量, Bi-LSTM, 文本表示

Abstract:

Method of text representation model was proposed to extract word-embedding from text feature.Firstly,the word-embedding of the dual word-embedding list based on dictionary index and the corresponding part of speech index was created.Then,feature vectors was obtained further from these extracted word-embeddings by using Bi-LSTM recurrent neural network.Finally,the sentence vectors were processed by mean-pooling layer and text categorization was classified by softmax layer.The training effects and extraction performance of the combination model of Bi-LSTM and double word-embedding neural network were verified.The experimental results show that this model not only performs well in dealing with the high-quality text feature vector and the expression sequence,but also significantly outperforms other three kinds of neural networks,which includes LSTM,LSTM+context window and Bi-LSTM.

Key words: neural network, word-embedding, Bi-LSTM, text representation

中图分类号: 

No Suggested Reading articles found!