电信科学 ›› 2021, Vol. 37 ›› Issue (10): 136-142.doi: 10.11959/j.issn.1000-0801.2021191

• 研究与开发 • 上一篇    下一篇

改进自训练模型在业务质差用户识别中的应用

余立1, 李哲1, 高飞1, 袁向阳1, 杨永2   

  1. 1 中国移动通信有限公司研究院,北京 100053
    2 中国移动通信集团公司,北京 100033
  • 修回日期:2021-06-20 出版日期:2021-10-20 发布日期:2021-10-01
  • 作者简介:余立(1981− ),男,中国移动通信有限公司研究院人工智能与智慧运营中心副总经理、高级工程师,主要研究方向为前沿移动通信技术、网络智能化、大数据和IT技术
    李哲(1992− ),男,中国移动通信有限公司研究院研究员,主要研究方向为 5G 核心网、人工智能、电信大数据、深度报文解析
    高飞(1978− ),男,中国移动通信有限公司研究院研究员,主要研究方向为人工智能、网络大数据分析、数据治理
    袁向阳(1978− ),男,中国移动通信有限公司研究院人工智能与智慧运营中心副总经理,主要研究方向为人工智能、网络智能化、大数据和IT技术
    杨永(1972−),男,中国移动通信集团公司网络事业部服务保障室经理,主要研究方向为无线网络质量、业务指标规划分析

Application of improved self-training model in the identification of users with poor service quality

Li YU1, Zhe LI1, Fei GAO1, Xiangyang YUAN1, Yong YANG2   

  1. 1 China Mobile Research Institute, Beijing 100053, China
    2 China Mobile Communications Corporation, Beijing 100033, China
  • Revised:2021-06-20 Online:2021-10-20 Published:2021-10-01

摘要:

质差用户识别是降低用户投诉率、提升用户满意度的重要环节。针对当前电信网络系统中业务感知相关的大量结构化及非结构化数据难以有效标注、质差用户标签不完备、现有监督学习模型训练样本不均衡而导致质差识别率低的问题,采用改进自训练半监督学习模型,利用少量满意度低分和投诉用户作为质差用户标签对网络数据进行标注,并通过标签迁移对大量未标注数据进行训练识别质差用户。实验表明,相比于识别准确率高但是训练成本高的全监督学习和识别准确率低的无监督学习,半监督学习可以充分利用无标签样本数据进行有效训练,保证较低训练成本的同时显著提升质差用户识别准确率。

关键词: 半监督学习, 改进自训练模型, 质差用户识别, 无标签数据

Abstract:

Poor quality user identification is an important method to reduce the complaint rate and increase satisfaction.It is difficult to effectively label a large amount of structured and unstructured data related to business perception in current telecommunications network systems, poor quality user labels are not complete, and the existing supervised learning model training samples are unbalanced, resulting in a low quality recognition rate.An improved self-training semi-supervised learning model was adopted, a small number of low-satisfaction and complaint users as poor quality user labels was used to label network data, and label migration was used to train a large amount of unlabeled data to identify poor quality users.Experiments show that compared to fully supervised learning with high recognition model accuracy but high training cost and unsupervised learning with low recognition model accuracy, semi-supervised learning can make full use of unlabeled sample data for effective training, ensuring lower training costs and the recognition accuracy of poor-quality users is significantly improved.

Key words: semi-supervised learning, improved self-training model, poor quality user identification, unlabeled data

中图分类号: 

No Suggested Reading articles found!