智能科学与技术学报 ›› 2024, Vol. 6 ›› Issue (2): 201-209.doi: 10.11959/j.issn.2096-6652.202419

• 学术论文 • 上一篇    

基于模糊自然语言处理的铁路CTC接口文本智能测试方法

角远韬, 李润梅(), 王剑   

  1. 北京交通大学自动化与智能学院,北京 100044
  • 收稿日期:2024-04-28 修回日期:2024-05-27 出版日期:2024-06-15 发布日期:2024-07-31
  • 通讯作者: 李润梅 E-mail:rmli@bjtu.edu.cn
  • 作者简介:角远韬(2000- ),男,北京交通大学自动化与智能学院硕士生,主要研究方向为智能测试方法、铁路调度集中控制。
    李润梅(1975- ),女,博士,北京交通大学自动化与智能学院副院长、教授,主要研究方向为二型模糊集理论、无人车控制关键技术、交通大数据处理与预测。
    王剑(1978- ),男,博士,北京交通大学自动化与智能学院党委副书记、副院长、教授,主要研究方向为列车运行控制系统、铁路卫星导航、调度控制一体化。
  • 基金资助:
    国家重点研发计划(2022YFB4300500);中国国家铁路集团有限公司科技研究开发计划(L2022X002)

Intelligent testing method for railway CTC interface data based on fuzzy natural language processing

Yuantao JIAO, Runmei LI(), Jian WANG   

  1. School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China
  • Received:2024-04-28 Revised:2024-05-27 Online:2024-06-15 Published:2024-07-31
  • Contact: Runmei LI E-mail:rmli@bjtu.edu.cn
  • Supported by:
    National Key R&D Program of China(2022YFB4300500);Technological R&D Program of China state Railway Group Co., Ltd. China Railway Project(L2022X002)

摘要:

模糊自然语言处理将模糊理论应用在自然语言处理(NLP)的任务中,随着大模型与人工智能的不断发展,有关文本数据的研究不断深化。铁路调度集中控制(CTC)系统作为大型复杂系统,各子系统、服务器软件间的接口数据均以日志文本格式存储与传输。由于其具有文本数量多、文本类型杂等特点,提出了一种模糊自然语言处理的方法,解决CTC系统接口数据的人工测试难题。模糊C均值(FCM)聚类算法将日志文本分为不同的标签类别,并将其作为NLP任务中命名实体识别的标签输入,在传统BiLSTM-CRF模型上引入BERT进行文本编码,更准确地理解文本之间的关系并提高文本识别的精确度。根据前序训练模型,研发了铁路CTC系统日志文本接口测试的智能验证工具,其可以改善目前CTC系统的人工测试现状,帮助测试人员进行接口测试验证,提升测试工作的智能化、自动化水平。

关键词: 自然语言处理, 模糊文本聚类, 铁路调度集中控制系统, 命名实体识别, 智能测试

Abstract:

Fuzzy natural language processing applies fuzzy theoretical knowledge to the task of natural language processing (NLP). With the continuous development of large language model and artificial intelligence, research on text data continues to deepen. As a large and complex system, the interface data between various subsystems and server software are stored and transmitted in log text format. Due to its large number of texts and miscellaneous text types, a fuzzy NLP method was proposed to solve the problem of manual testing the interface data of centralized traffic control (CTC) system. The fuzzy C-means (FCM) clustering algorithm divided the log text into different label categories, which was used as the label input for named entity recognition in NLP tasks, and BERT was introduced on the traditional BiLSTM-CRF model for text encoding, which understood the relationship between texts more accurately and improved the accuracy of text recognition. An intelligent verification tool for log-text interface testing of railway CTC system was presented based on an improved training model, which enhanced the current manual testing process of CTC system, assisted testing staff in verifying the interface testing, and increased the level of intelligence and automation in testing work.

Key words: natural language processing, fuzzy text clustering, railway centralized traffic control system, named entity recognition, intelligent testing

中图分类号: 

No Suggested Reading articles found!