智能科学与技术学报 ›› 2021, Vol. 3 ›› Issue (4): 466-473.doi: 10.11959/j.issn.2096-6652.202146

• 学术论文 • 上一篇    下一篇

基于多级注意力融合机制的藏文实体关系抽取

王丽客1,2, 孙媛1,2, 刘思思1,2   

  1. 1 中央民族大学信息工程学院,北京 100081
    2 中央民族大学国家语言资源监测与研究少数民族语言中心,北京 100081
  • 修回日期:2021-02-23 出版日期:2021-12-15 发布日期:2021-12-01
  • 作者简介:王丽客(1996- ),女,中央民族大学信息工程学院硕士生,主要研究方向为自然语言处理、知识图谱
    孙媛(1979- ),女,博士,中央民族大学信息工程学院副教授,主要研究方向为自然语言处理、知识图谱、问答系统
    刘思思(1998- ),女,中央民族大学信息工程学院硕士生,主要研究方向为自然语言处理、知识图谱
  • 基金资助:
    国家自然科学基金资助项目(61972436)

Tibetan entity relation extraction based on multi-level attention fusion mechanism

Like WANG1,2, Yuan SUN1,2, Sisi LIU1,2   

  1. 1 School of Information Engineering, Minzu University of China, Beijing 100081, China
    2 National Language Resource and Monitoring and Research Center of Minority Languages, Minzu University of China, Beijing 100081, China
  • Revised:2021-02-23 Online:2021-12-15 Published:2021-12-01
  • Supported by:
    The National Nature Science Foundation of China(61972436)

摘要:

与中英文相比,藏文实体关系训练语料规模较小,传统有监督的学习方法难以获得较高的准确率。针对基于远程监督的实体关系抽取存在错误标记的问题,利用远程监督方法将知识库与文本对齐,构建藏文实体关系抽取的数据集,提出一个基于多级注意力融合机制的藏文实体关系抽取模型。在词级别引入自注意力机制来提取单词的内部特征,在句子级别引入注意力机制为每个实例分配权重,从而充分利用包含信息的句子,减少噪声实例的权重。同时引入联合评分函数,修正远程监督的错误标签,并将神经网络与支持向量机结合,实现藏文实体关系分类。实验结果表明,提出的模型有效提高了藏文实体关系抽取的准确率,且优于基线模型效果。

关键词: 藏文, 实体关系抽取, 多级注意力融合机制, 支持向量机

Abstract:

Compared with Chinese and English, the training corpus of Tibetan entity relation is smaller, so it is difficult to obtain higher accuracy based on traditional supervised learning methods.And there exists the problem of wrong labels in distant supervision for relation extraction.To solve these problems, the distant supervision method was used to construct the data set of Tibetan entity relation extraction through aligning the knowledge base with texts, which could alleviate the problem of lacking of large-scale corpus in Tibetan.And a Tibetan entity relation extraction model based on multi-level attention fusion mechanism was proposed.The self-attention was added to extract internal features of words in word level.The selective attention mechanism could assign weights of each instance, so as to make full use of informative sentences and reduce weights of noisy instances.Meanwhile, a joint score function was introduced to correct wrong labels, and neural network was combined with support vector machine to extract relations.Experimental results show that the proposed model can effectively improve the accuracy of Tibetan entity relation extraction, and is better than the baseline.

Key words: Tibetan, entity relation extraction, multi-level attention fusion mechanism, support vector machine

中图分类号: 

No Suggested Reading articles found!