大数据 ›› 2021, Vol. 7 ›› Issue (3): 3-14.doi: 10.11959/j.issn.2096-0271.2021022

• 专题:基于大数据的知识图谱及其应用 • 上一篇    下一篇

基于主体掩码的实体关系抽取方法

郑慎鹏1, 陈晓军1, 向阳1, 沈汝超2   

  1. 1 同济大学电子与信息工程学院,上海 201804
    2 上海国际港务(集团)股份有限公司,上海 200080
  • 出版日期:2021-05-15 发布日期:2021-05-01
  • 作者简介:郑慎鹏(1995- ),男,同济大学电子与信息工程学院硕士生,主要研究方向为自然语言处理。
    陈晓军(1995- ),男,同济大学电子与信息工程学院博士生,主要研究方向为自然语言处理。
    向阳(1962- ),男,同济大学电子与信息工程学院教授,主要研究方向为数据挖掘、自然语言处理、智能决策支持系统。
    沈汝超(1989- ),男,上海国际港务(集团)股份有限公司工程师,主要研究方向为港口科技管理。
  • 基金资助:
    国家自然科学基金资助项目(72071145);国家重点研发计划资助项目(2019YFB1704402)

An entity relation extraction method based on subject mask

Shenpeng ZHENG1, Xiaojun CHEN1, Yang XIANG1, Ruchao SHEN2   

  1. 1 College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
    2 Shanghai International Port (Group) Co., Ltd., Shanghai 200080, China
  • Online:2021-05-15 Published:2021-05-01
  • Supported by:
    The National Natural Science Foundation of China(72071145);The National Key Research and Development Program of China(2019YFB1704402)

摘要:

实体关系抽取技术能够自动化地从海量无结构文本中抽取信息,构建大规模知识图谱,丰富现有知识图谱的内容,为知识图谱推理和应用提供支持。目前级联式的实体关系抽取技术已经取得了不错的成绩,但其在主体的向量表示和指针网络解码上存在不足。针对主体向量表示问题,提出利用注意力机制和掩码机制生成主体向量的方法。另外,针对指针网络中因遗漏标注而解码出过长实体的问题,提出引入实体序列标记任务,辅助指针网络解码实体。在大规模实体关系数据集DuIE2.0上进行实验验证得出,相较于先前模型,所提方法取得了0.88%的提升。

关键词: RoBERTa, 实体关系抽取, 主体掩码

Abstract:

Entity relationship extraction technology can automatically extract information from massive unstructured texts to construct large-scale knowledge graph, enrich the content of existing knowledge graph, and provide support for reasoning and application of knowledge graph.Although the cascading entity relation extraction technology has achieved good results, it has some shortcomings in the vector representation of the subject and the decoding of pointer network.In order to solve the representation problem of subject vectors, attention mechanism and mask mechanism were used to generate subject vectors.In addition, to solve the problem that long entities have been decoded in pointer network due to missing label, an entity sequence marker task was introduced to assist pointer network decoding entities.There is a 0.88% improvement over the previous model on the large-scale entity relationship dataset DuIE 2.0.

Key words: RoBERTa, entity relation extraction, subject mask

中图分类号: 

No Suggested Reading articles found!