电信科学 ›› 2023, Vol. 39 ›› Issue (2): 132-144.doi: 10.11959/j.issn.1000-0801.2023021

• 研究与开发 • 上一篇    下一篇

融合混合嵌入与关系标签嵌入的三元组联合抽取方法

戴剑锋, 陈星妤, 董黎刚, 蒋献   

  1. 浙江工商大学,浙江 杭州 310018
  • 修回日期:2023-01-20 出版日期:2023-02-20 发布日期:2023-02-01
  • 作者简介:戴剑锋(1997- ),男,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院)硕士生,主要研究方向为智慧教育、自然语言处理
    陈星妤(1999- ),女,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院)硕士生,主要研究方向为智慧教育、自然语言处理
    董黎刚(1973- ),男,博士,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院)党委书记、教授、博士生导师,浙江省计算机学会理事,主要研究方向为新一代网络和分布式系统
    蒋献(1988- ),男,浙江工商大学信息与电子工程学院(萨塞克斯人工智能学院)讲师、实验员,主要研究方向为智慧教育和智慧网络
  • 基金资助:
    国家社会科学基金资助项目(17BYY090);浙江省重点研发计划项目(2017C03058);浙江省“尖兵”“领雁”研发攻关计划项目(2023C03202)

A triple joint extraction method combining hybrid embedding and relational label embedding

Jianfeng DAI, Xingyu CHEN, Ligang DONG, Xian JIANG   

  1. Zhejiang Gongshang University, Hangzhou 310018, China
  • Revised:2023-01-20 Online:2023-02-20 Published:2023-02-01
  • Supported by:
    The National Social Science Foundation of China(17BYY090);Zhejiang Province Key Research and Development Program(2017C03058);Zhejiang Province “Top Soldiers” and “Leading Geese” Project(2023C03202)

摘要:

三元组抽取的目的是从非结构化的文本中获取实体与实体间的关系,并应用于下游任务。嵌入机制对三元组抽取模型的性能有很大影响,嵌入向量应包含与关系抽取任务密切相关的丰富语义信息。在中文数据集中,字词之间包含的信息有很大区别,为了改进由分词错误产生的语义信息丢失问题,设计了融合混合嵌入与关系标签嵌入的三元组联合抽取方法(HEPA),提出了采用字嵌入与词嵌入结合的混合嵌入方法,降低由分词错误产生的误差;在实体抽取层中添加关系标签嵌入机制,融合文本与关系标签,利用注意力机制来区分句子中实体与不同关系标签的相关性,由此提高匹配精度;采用指针标注的方法匹配实体,提高了对关系重叠三元组的抽取效果。在公开的DuIE数据集上进行了对比实验,相较于表现最好的基线模型(CasRel), HEPA的F1值提升了2.8%。

关键词: 三元组抽取, 关系嵌入, BERT, 注意力机制, 指针标注

Abstract:

The purpose of triple extraction is to obtain relationships between entities from unstructured text and apply them to downstream tasks.The embedding mechanism has a great impact on the performance of the triple extraction model, and the embedding vector should contain rich semantic information that is closely related to the relationship extraction task.In Chinese datasets, the information contained between words is very different, and in order to avoid the loss of semantic information problems generated by word separation errors, a triple joint extraction method combining hybrid embedding and relational label embedding (HEPA) was designed, and a hybrid embedding means that combines letter embedding and word embedding was proposed to reduce the errors generated by word separation errors.A relational embedding mechanism that fuses text and relational labels was added, and an attention mechanism was used to distinguish the relevance of entities in a sentence with different relational labels, thus improving the matching accuracy.The method of matching entities with pointer annotation was used, which improved the extraction effect on relational overlapping triples.Comparative experiments are conducted on the publicly available DuIE dataset, and the F1 value of HEPA is improved by 2.8% compared to the best performing baseline model (CasRel).

Key words: triple extraction, relational embedding, BERT, attention mechanism, pointer annotation

中图分类号: 

No Suggested Reading articles found!