通信学报 ›› 2022, Vol. 43 ›› Issue (10): 186-195.doi: 10.11959/j.issn.1000-436x.2022193

• 学术论文 • 上一篇    下一篇

基于快速密度峰值聚类离群因子的离群点检测算法

张忠平1,2,3, 李森1, 刘伟雄1, 刘书霞4   

  1. 1 燕山大学信息科学与工程学院,河北 秦皇岛 066004
    2 河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004
    3 河北省软件工程重点实验室,河北 秦皇岛 066004
    4 河北科技师范学院,河北 秦皇岛 066004
  • 修回日期:2022-07-08 出版日期:2022-10-25 发布日期:2022-10-01
  • 作者简介:张忠平(1972− ),男,吉林松原人,博士,燕山大学教授,主要研究方向为大数据、数据挖掘、半结构化数据等
    李森(1997− ),男,河南周口人,燕山大学硕士生,主要研究方向为数据挖掘
    刘伟雄(1997− ),男,广东广州人,燕山大学硕士生,主要研究方向为数据挖掘
    刘书霞(1974− ),女,河北邢台人,博士,河北科技师范学院讲师,主要研究方向为大数据技术、深度学习、区块链等
  • 基金资助:
    国家自然科学基金资助项目(61972334);国家社会科学基金资助项目(20BJ122);河北省创新能力提升计划基金资助项目(20557640D);四达铁路智能图像工件识别基金资助项目(x2021134)

Outlier detection algorithm based on fast density peak clustering outlier factor

Zhongping ZHANG1,2,3, Sen LI1, Weixiong LIU1, Shuxia LIU4   

  1. 1 College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
    2 The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
    3 The Key Laboratory of Software Engineering of Hebei Province, Qinhuangdao 066004, China
    4 Hebei Normal University of Science and Technology, Qinhuangdao 066004, China
  • Revised:2022-07-08 Online:2022-10-25 Published:2022-10-01
  • Supported by:
    The National Natural Science Foundation of China(61972334);The National Social Science Foundation of China(20BJ122);Hebei Province Innovation Capability Improvement Plan Project(20557640D);The Intelligent Image Workpiece Recognition of Sida Railway(x2021134)

摘要:

摘 要:针对密度峰值聚类算法需要人工设置参数、时间复杂度高的问题,提出了基于快速密度峰值聚类离群因子的离群点检测算法。首先,使用k近邻算法代替密度峰值聚类中的密度估计,采用KD-Tree索引数据结构计算数据对象的k近邻;然后,采用密度和距离乘积的方式自动选取聚类中心。此外,定义了向心相对距离、快速密度峰值聚类离群因子来刻画数据对象的离群程度。在人工数据集和真实数据集上对所提算法进行实验验证,并与一些经典和新颖的算法进行对比实验,从正确性和时间效率上验证了所提算法的有效性。

关键词: 数据挖掘, 密度峰值聚类, 离群点, k近邻, 向心相对距离

Abstract:

For the problem that peak density clustering algorithm requires human set parameters and high time complexity, an outlier detection algorithm based on fast density peak clustering outlier factor was proposed.Firstly, k nearest neighbors algorithm was used to replace the density peak of density estimate, which adopted the KD-Tree index data structure calculation of k close neighbors of data objects, and then the way of the product of density and distance was adopted to automatic selection of clustering centers.In addition, the centripetal relative distance and fast density peak clustering outliers were defined to describe the degree of outliers of data objects.Experiments on artificial data sets and real data sets were carried out to verify the algorithm, and compared with some classical and novel algorithms.The validity and time efficiency of the proposed algorithm are verified.

Key words: data mining, density peak clustering, outlier, k nearest neighbor, centripetal relative distance

中图分类号: 

No Suggested Reading articles found!