Big Data Research ›› 2022, Vol. 8 ›› Issue (6): 40-55.doi: 10.11959/j.issn.2096-0271.2022067

• TOPIC: BIG DATA TECHNOLOGY AND METHOD IN DIGITAL HUMANITIES • Previous Articles     Next Articles

Explore the structuration of historical books:the construction and quantitative analysis of digital humanities database of the Biographies of the Shiji

Tongzheheng ZHENG1, Bin LI1, Minxuan FENG1, Bolin CHANG1, Dongbo WANG2   

  1. 1 School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China
    2 College of Information Management, Nanjing Agricultural University, Nanjing 210095, China
  • Online:2022-11-15 Published:2022-11-01
  • Supported by:
    The Social Science Fund of Jiangsu(20JYB004);The National Social Science Foundation of China(18BYY127);The National Social Science Foundation of China(21&ZD331)

Abstract:

Ancient Chinese classical books are vast and contain a lot of historical and humanistic knowledge.The development and application mode of the digitization of ancient books based on digitization and full-text retrieval has become an important basic resource and tool for language and literature, history, philosophy and other disciplines.With the development of artificial intelligence and big data technology, the research paradigm of digital humanities is constantly evolving.It is a new exploration to convert the text of traditional books into a highly structured new digital humanities database.Organizing elements such as words, characters, and geographical entities in the text organically is of great significance for the visualization of historical knowledge and the quantification of historical information.The Biographies of the Shiji was selected as the object.The automatic word segmentation and part-of-speech tagging, manual proofreading and manual annotation of entity information were performed to construct a multi-level and high-quality structured digital humanities knowledge base, realize quantitative analysis and visual retrieval of elements, such as words, characters and locations of ancient books, and excavate information such as distribution of characters and locations, relationship between characters and relationship between people and locations.It was concluded that there are 1 787 persons and 1 173 locations in the Biographies of the Shiji, and compared with Benji and Shijia of the Shiji, there are 1 092 unique persons and 556 unique locations of the Biographies of the Shiji.New ideas and frameworks for the construction of digital humanities knowledge base of ancient books were provided.

Key words: digital humanities, the Biographies of the Shiji, knowledge service, big data, ancient Chinese information processing

CLC Number: 

No Suggested Reading articles found!