大数据 ›› 2021, Vol. 7 ›› Issue (6): 19-29.doi: 10.11959/j.issn.2096-0271.2021057

• 专题:大数据支撑的智能应用 • 上一篇    下一篇

基于BERT阅读理解框架的司法要素抽取方法

黄辉1, 秦永彬1,2, 陈艳平1,2, 黄瑞章1,2   

  1. 1 贵州大学计算机科学与技术学院,贵州 贵阳 550025
    2 公共大数据国家重点实验室,贵州 贵阳 550025
  • 出版日期:2021-11-15 发布日期:2021-11-01
  • 作者简介:黄辉(1994- ),男,贵州大学计算机科学与技术学院硕士生,主要研究方向为自然语言处理、智能问答
    秦永彬(1980- ),男,博士,贵州大学计算机科学与技术学院教授、院长,主要研究方向为大数据处理、云计算、文本挖掘
    陈艳平(1980- ),男,博士,贵州大学计算机科学与技术学院副教授,主要研究方向为人工智能、自然语言处理
    黄瑞章(1979- ),女,博士,贵州大学计算机科学与技术学院副教授,主要研究方向为信息检索、文本挖掘
  • 基金资助:
    国家自然科学基金资助项目(62066008);贵州省科学技术基金重点项目([2020]1Z055)

Legal element extraction method based on BERT reading comprehension framework

Hui HUANG1, Yongbin QIN1,2, Yanping CHEN1,2, Ruizhang HUANG1,2   

  1. 1 School of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    2 State Key Laboratory of Public Big Data, Guiyang 550025, China
  • Online:2021-11-15 Published:2021-11-01
  • Supported by:
    The National Natural Science Foundation of China(62066008);The Key Projects of Science and Technology of Guizhou Province([2020]1Z055)

摘要:

司法要素抽取是司法智能化辅助应用的重要基础,其目的是判别裁判文书涉及的关键案情要素。以往司法要素抽取通常采用多标签分类方法进行建模,模型主要依赖于裁判文书文本特征,忽略了要素标签的语义信息。同时,由于司法数据集存在样本分布不均衡的情况,分类方法会因负例过多而导致模型性能不佳。针对上述问题,提出基于BERT阅读理解框架的司法要素抽取方法。该方法引入标签信息和法律先验知识构造辅助问句,利用BERT机器阅读理解模型建立辅助问句和裁判文书之间的语义联系。同时,在问句中标签所在位置前后增加特殊标识符以增强模型的学习能力。实验结果表明,该方法在CAIL2019要素抽取公开数据集上性能得到显著提升,在婚姻家庭、劳动争议、借款合同3种案由上分别提升F1值2.7%、11.3%、5.6%。

关键词: 要素抽取, 机器阅读理解, 神经网络, BERT

Abstract:

Extraction of legal elements is an important basis for judicial intelligent auxiliary applications, and its purpose is to identify the key elements involved in the judgment document.In the past, extracting legal elements usually used multi-label classification methods for modeling.These methods mainly relied on the text features of the judgment document, thereby ignoring the label features.Besides, due to the imbalanced data problem in judicial data sets, the classification method will lead to poor model performance because of too many negative examples.To solve the above problems, a legal element extraction method based on BERT reading comprehension framework was proposed.This method constructed auxiliary questions with label information and legal prior knowledge, and used the machine reading comprehension model based on BERT to establish the semantic associations between question and judgment document.And this method added special tokens before and after the label in the question to enhance the learning ability of the model.Experiments were conducted on the legal element extraction data sets of the CAIL2019.Experiment results show that the performance is improved significantly, and the F1 value has been increased by 2.7%, 11.3%, and 5.6% respectively on the data sets of marriage and family case, labor dispute case, and loan contract dispute case.

Key words: element extraction, machine reading comprehension, neural network, BERT

中图分类号: 

No Suggested Reading articles found!