通信学报 ›› 2021, Vol. 42 ›› Issue (9): 133-143.doi: 10.11959/j.issn.1000-436x.2021152

• 学术论文 • 上一篇    下一篇


张忠平1,2,3, 刘伟雄1, 张玉停1, 邓禹1, 魏棉鑫1   

  1. 1 燕山大学信息科学与工程学院,河北 秦皇岛 066004
    2 河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004
    3 河北省软件工程重点实验室,河北 秦皇岛 066004
  • 修回日期:2021-06-30 出版日期:2021-09-25 发布日期:2021-09-01
  • 作者简介:张忠平(1972− ),男,吉林松原人,博士,燕山大学教授,主要研究方向为大数据、数据挖掘、半结构化数据等
    刘伟雄(1997− ),男,广东广州人,燕山大学硕士生,主要研究方向为数据挖掘
    张玉停(1996− ),男,安徽阜阳人,燕山大学硕士生,主要研究方向为数据挖掘
    邓禹(1996− ),男,河北唐山人,燕山大学硕士生,主要研究方向为数据挖掘
    魏棉鑫(1997− ),男,广东汕头人,燕山大学硕士生,主要研究方向为数据挖掘
  • 基金资助:

ERDOF: outlier detection algorithm based on entropy weight distance and relative density outlier factor

Zhongping ZHANG1,2,3, Weixiong LIU1, Yuting ZHANG1, Yu DENG1, Mianxin WEI1   

  1. 1 College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
    2 The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
    3 The Key Laboratory of Software Engineering of Hebei Province, Qinhuangdao 066004, China
  • Revised:2021-06-30 Online:2021-09-25 Published:2021-09-01
  • Supported by:
    Hebei Province Innovation Capability Improvement Plan Project(20557640D)



关键词: 数据挖掘, 离群点检测, 信息熵, 核密度估计


An outlier detection algorithm based on entropy weight distance and relative density outlier factor was proposed to solve the problem of low accuracy in complex data distribution and high dimensional data sets.Firstly, entropy weight distance was introduced instead of euclidean distance to improve the detection accuracy of outliers.Then, the Gaussian kernel density estimation was carried out for the data object based on the concept of natural neighbor.At the same time, relative distance was proposed to describe the degree of the data object deviating from the neighborhood and improve the ability of the algorithm to detect outliers in the low-density region.Finally, the entropy weight distance and relative density outlier factor were proposed to describe the degree of outliers.Experiments with artificial data sets and real data sets show that the proposed algorithm can effectively adapt to various data distributions and outlier detection of high-dimensional data.

Key words: data mining, outlier detection, information entropy, kernel density estimation


No Suggested Reading articles found!