结合全局向量特征的神经网络依存句法分析模型

doi:10.11959/j.issn.1000-436x.2018024

摘要/Abstract

摘要：

利用时序型长短时记忆（LSTM,long short term memory）网络和分片池化的卷积神经网络（CNN,convolutional neural network），分别提取词向量特征和全局向量特征，将2类特征结合输入前馈网络中进行训练；模型训练中，采用基于概率的训练方法。与改进前的模型相比，该模型能够更多地关注句子的全局特征；相较于最大化间隔训练算法，所提训练方法更充分地利用所有可能的依存句法树进行参数更新。为了验证该模型的性能，在宾州中文树库（CTB5,Chinese Penn Treebank 5）上进行实验，结果表明，与已有的仅使用LSTM或CNN的句法分析模型相比，该模型在保证一定效率的同时，能够有效提升依存分析准确率。

关键词: 依存句法分析, 图模型, 长短时记忆网络, 卷积神经网络, 特征

Abstract:

LSTM and piecewise CNN were utilized to extract word vector features and global vector features,respectively.Then the two features were input to feed forward network for training.In model training,the probabilistic training method was adopted.Compared with the original dependency paring model,the proposed model focused more on global features,and used all potential dependency trees to update model parameters.Experiments on Chinese Penn Treebank 5 (CTB5) dataset show that,compared with the parsing model using LSTM or CNN only,the proposed model not only remains the relatively low model complexity,but also achieves higher accuracies.

Key words: dependency parsing, graph-based model, long short-term memory network, convolutional neural network,feature

中图分类号:

TN912.3

王衡军,司念文,宋玉龙,单义栋. 结合全局向量特征的神经网络依存句法分析模型[J]. 通信学报, 2018, 39(2): 53-64.

Hengjun WANG,Nianwen SI,Yulong SONG,Yidong SHAN. Neural network model for dependency parsing incorporating global vector feature[J]. Journal on Communications, 2018, 39(2): 53-64.

图/表 18

图1

图2

图3

图4

图5

图7

图6

表1

表2

表3

图8

图9

表4

表5

表6

表7

其他参数设置与向量维度大小"

参数名称	维度（数值）	参数描述
$E_{w}^{i}$	50×1	输入词向量
$E_{p}^{i}$	30×1	输入词性向量
s_i	150×1	模型隐含层向量
w_o	\|L\|×150	模型输出层权重矩阵
b_o	\|L\|×1	模型输出层偏移向量
y_i	\|L\|× 1	模型输出层向量
λ	10^?5	目标函数的正则化参数
l	0.1	初始学习率
γ	0.5	梯度下降速率
epoch	50	迭代次数

表7

图10

表8

参考文献 25

[1]	MCDONALD R , CRAMMER K , PEREIRA F . Online large-margin training of dependency parsers[C]// The 43rd Annual Meeting on Association for Computational Linguistics. 2005: 91-98.
[2]	EISNER J M . Three new probabilistic models for dependency parsing:an exploration[J]. Computer Science, 1997: 340-345.
[3]	MCDONALD R T , PEREIRA F C N . Online learning of approximate dependency parsing algorithms[C]// The 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006: 81-88.
[4]	CARRERAS X , . Experiments with a higher-order projective dependency parser[C]// The 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 957-961.
[5]	KOO T , COLLINS M . Efficient third-order dependency parsers[C]// The 48th Annual Meeting of the Association for Computational Linguistics. 2010: 1-11.
[6]	马学喆 . 依存句法分析的若干关键问题的研究[D]. 上海:上海交通大学, 2013.
	MA X Z . Research on key issues of dependency parsing[D]. Shanghai:Shanghai Jiaotong University, 2013.
[7]	PEI W Z , GE T , CHANG B B . An effective neural network model for graph-based dependency parsing[C]// The 53rd Annual Meeting of the Association for Computational Linguistics. 2015: 313-322.
[8]	WANG W H , CHANG B B . Graph-based dependency parsing with bidirectional LSTM[C]// The 54th Annual Meeting of the Association for Computational Linguistics. 2016: 2306-2315.
[9]	KIPERWASSER E , GOLDBERG Y . Simple and accurate dependency parsing using bidirectional LSTM feature representations[J]. Transactions of the Association for Computational Linguistics, 2016(4): 313-327.
[10]	ZHANG Z S , ZHAO H , QIN L H . Probabilistic graph-based dependency parsing with convolutional neural network[C]// The 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1382-1392.
[11]	MA X Z , HOVY E . Neural probabilistic model for non-projective MST parsing[J].arXiv:arXiv,2017. arXiv:arXiv:1701.00874 2017.
[12]	ZHANG J J , ZHANG D K , HAO J . Local translation prediction with global sentence representation[C]// The 24th International Joint Conference on Artificial Intelligence. 2015: 1398-1404.
[13]	KALCHBRENNER N , GREFENSTETTE E , BLUNSOM P . A convolutional neural network for modelling sentences[J]. Eprint Arxiv, 2014(1).
[14]	COLLOBERT R , WESTON J , BOTTOU L ,et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011(12): 2493-2537.
[15]	JASON C , ERIC N . Named entity recognition with bidirectional lstm-cnns[J]. Transactions of the Association for Computational Linguistics, 2016(4): 357-370.
[16]	曾谁飞, 张笑燕, 杜晓峰 ,等. 基于神经网络的文本表示模型新方法[J]. 通信学报, 2017,38(4): 86-98.
	ZENG S F , ZHANG X Y , DU X F ,et al. New method of text representation model based on neural network[J]. Journal on Communications, 2017,38(4): 86-98.
[17]	CHEN D Q , MANNING C . A fast and accurate dependency parser using neural networks[C]// Conference on Empirical Methods in Natural Language Processing. 2014: 740-750.
[18]	WEISS D , ALBERTI C , COLLINS M ,et al. Structured training for neural network transition-based parsing[C]// The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 323-333.
[19]	ANDOR D , ALBERTI C , WEISS D ,et al. Globally normalized transition-based neural networks[C]// The 54rd Annual Meeting of the Association for Computational Linguistics. 2016: 2442-2452.
[20]	DYER C , BALLESTEROS M , WANG L ,et al. Transition-based dependency parsing with stack long short-term memory[J]. ComputerScience, 2015,37(2): 321-332.
[21]	CHENG H , FANG H , HE X D ,et al. Bi-directional attention with agreement for dependency parsing[C]// Conference on Empirical Methods in Natural Language Processing. 2016: 2204-2214.
[22]	DOZAT T , MANNING C D . Deep biaffine attention for neural dependency parsing[J].arXiv:arXiv 1611.01734,2016. arXiv:arXiv 1611,01734, 2016.
[23]	NEUBIG G , DYER C , GOLDBERG Y ,et al. DyNet:the dynamic neural network toolkit[J]. arXiv:arXiv 1701.03980, 2017.
[24]	ZHANG Y , NIVRE J . Transition-based dependency parsing with rich non-local features[C]// The 49th Annual Meeting of the Association for Computational Linguistics. 2011: 188-193.
[25]	ZHANG H , MCDONALD R . Enforcing structural diversity in cube-pruned dependency parsing[C]// The 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 656-666.

短语结构树例句	依存结构树例句
(IP-HLN (NP-SBJ (NP-PN (NR 上海)	1 上海 NR 2	NMOD
(NR 浦东))	2 浦东 NR 6	NMOD
(NP (NN 开发)	3 开发 NN 6	NMOD
(CC 与)	4 与 CC 6	NMOD
(NN 法制)	5 法制 NN 6	NMOD
(NN 建设)))	6 建设 NN 7	SUB
(VP (VV 同步)))	7 同步 VV 0	ROOT

数据集	篇章序号	句子数	词数
训练集	001~815; 1 001~1 136	16 079	370 777
开发集	886~931; 1 148~1 151	804	17 426
测试集	816~885; 1 137~1 147	1 915	42 773

词语	维度1	维度2	维度3	维度4	维度5	维度6	维度7	维度8	…
北京	0.784 532	0.126 508	?0.057 374	?0.154 533	0.332 767	?0.097 764	0.326 162	?0.050 384	…
是	0.266 844	0.302 665	?0.264 444	?0.226 088	0.441 961	0.016 561	0.301 905	0.502 168	…
中国	0.219 274	?0.014 053	0.131 701	?0.286 423	0.335 406	?0.358 501	0.673 590	?0.042 960	…
的	0.217 868	0.127 214	?0.149 943	?0.321 670	0.492 081	0.059 336	0.125 756	0.118 209	…
首都	?0.026 037	?0.481 110	0.242 895	0.234 439	0.002 181	0.174 378	0.376 564	0.373 155	…
。	0.472 912	0.190 043	?0.269 932	?0.252 637	0.368 207	?0.009 401	0.306 351	0.124 188	…
上海	0.730 383	?0.120 522	0.057 258	?0.139 031	0.066 892	?0.064 185	0.207 783	?0.374 539	…
浦东	0.484 598	?0.354 264	?0.023 650	0.244 754	?0.167 892	0.134 869	0.174 820	?0.593 116	…
开发	0.095 341	?0.730 478	?0.233 886	?0.270 567	0.182 623	0.313 196	?0.008 622	?0.193 032	…
与	?0.206 168	?0.178 119	0.087 402	?0.033 061	0.410 609	?0.186 066	0.368 784	0.011 376	…
法制	?0.176 999	0.472 356	?0.179 480	?0.312 853	0.651 259	?0.345 816	?0.471 174	?0.225 371	…
建设	0.003 175	?0.034 065	?0.223 146	0.021 553	0.255 378	0.221 632	?0.279 153	0.244 122	…
同步	?0.144 631	?0.027 353	?0.243 820	?0.205 473	0.200 052	0.040 610	0.059 072	?0.277 645	…

句法分析器	UAS	LAS
文献[6]方法	87.2%	—
文献[10]方法	87.65%	86.17%
本文方法	87.59%	86.30%

参数名称	参数维度	参数描述
w_c	100× 80	词向量连接操作权重矩阵
b_c	100×1	词向量连接操作偏移向量
Uⁱ,Wⁱ	120×100	输入门权重矩阵
U^f,W^f	120×100	遗忘门权重矩阵
U^o,W^o	120×100	输出门权重矩阵
U^c,W^c	120×100	记忆单元权重矩阵
bⁱ,b^f,b^o,b^c	120×1	各个门偏移向量