大数据 ›› 2022, Vol. 8 ›› Issue (6): 15-25.doi: 10.11959/j.issn.2096-0271.2022046

• 专题:面向人文领域的大数据技术和方法 • 上一篇    下一篇

数字人文视域中的古籍文本标注方法研究——以MARKUS为例

于亚秀1, 李欣2   

  1. 1 华东师范大学图书馆,上海 200062
    2 华东师范大学数据科学与工程学院, 上海 200062
  • 出版日期:2022-11-15 发布日期:2022-11-01
  • 作者简介:于亚秀(1985- ),女,华东师范大学图书馆副研究馆员,主要研究方向为数字人文、知识组织与管理、智慧图书馆建设
    李欣(1961- ),女,华东师范大学数据科学与工程学院研究馆员,主要研究方向为语义网知识组织与管理、数字人文、推荐系统
  • 基金资助:
    中央高校基本科研业务费项目(2022ECNU-XWK-ZX05)

Research on text annotation method of ancient works from the perspective of digital humanities:a case study on MARKUS

Yaxiu YU1, Xin LI2   

  1. 1 East China Normal University Library, Shanghai 200062, China
    2 School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Online:2022-11-15 Published:2022-11-01
  • Supported by:
    Fundamental Research Funds for the Central Universities(2022ECNU-XWK-ZX05)

摘要:

文本标注是文本分析挖掘中的重要一步,面对大规模古籍资源,人工标注无法满足人文研究需求,且古籍语法结构和语言特点特殊,现代文本标注技术很难直接用于古籍研究。在分析人文研究者进行古籍文本标注中面临的难点和痛点的基础上,提出普适性的古籍标注标准流程,给出基于MARKUS的文本标注模型,并通过具体实践,探索基于该模型的古籍文本标注方法,旨在助推借助数字人文工具改变古籍人文研究方式,拓宽研究规模的应用深度。

关键词: 数字人文, 古籍, 文本标注, MARKUS

Abstract:

Text annotation is an important step in text analysis and mining.Manual labeling can no longer meet the needs of humanistic research faced with large-scale text resources, and due to the special grammatical structure and language characteristics of ancient works, the text annotation technology on modern corpora cannot be directly applied to the ancient works.Based on the analysis of the challenges faced by humanities researchers, a universal standard text annotation process of ancient works was proposed, and a model based on MARKUS was given.And ancient works annotation method based on this model through specific example was explored, to promote using tools to change the research methods in digital humanities and to expand the scale of research.

Key words: digital humanities, ancient works, text annotation, MARKUS

中图分类号: 

No Suggested Reading articles found!