大数据

• •    

混合型数据的邻域条件互信息熵属性约简算法

兰海波   

  1. 中国气象局公共气象服务中心, 北京100081

  • 作者简介:兰海波(1979- ),男,本科,高级工程师,主要研究方向为大数据处理技术、 自然语言处理技术、数据库技术和气象服务信息系统的关键技术研究应用。

Neighborhood conditional mutual information entropy attribute reduction algorithm for hybrid data

LAN Haibo   

  1. CMA Public Meteorological Service Center, Beijing 100081, China

摘要:

属性约简是粗糙集理论的重要研究内容,其主要目的是消除信息系统中不相关的属性,降低数据维度并提高数据知识发现性能。然而,基于粗糙集的属性约简方法大多没有考虑到属性之间的依赖性,使得最终的属性约简结果存在一定的冗余属性,对此本文提出一种邻域条件互信息熵的属性约简算法。首先,在传统邻域熵的基础上,针对混合型的数据提出混合型邻域互信息熵模型和混合型邻域条件互信息熵模型;然后利用这两种熵模型进行混合型信息系统的属性依赖度评估和属性启发式搜索,并设计出一种属性约简算法;最后通过UCI数据集的实验分析,表明了本文算法具有较高的属性约简性能。

关键词: 粗糙集, 属性约简, 邻域, 互信息熵, 条件互信息熵

Abstract:

Attribute reduction is an important research content of rough set theory. Its main purpose is to eliminate irrelevant attributes in information system, reduce data dimension and improve data knowledge discovery performance. However, most of the attribute reduction methods based on rough set do not consider the dependence between attributes, which makes the final attribute reduction result have some redundant attributes. In this paper, an attribute reduction algorithm based on neighborhood conditional mutual information entropy is proposed. Firstly, based on the traditional neighborhood entropy, a hybrid neighborhood mutual information entropy model and a hybrid neighborhood conditional mutual information entropy model are proposed for hybrid data; Then the two entropy models are used to evaluate the attribute dependence and attribute heuristic search of hybrid information system, and an attribute reduction algorithm is designed; Finally, through the experimental analysis of UCI data sets, it is shows that the algorithm has higher attribute reduction performance.

Key words:

Rough set, Attribute reduction, Neighborhood, Mutual information entropy, Conditional mutual information entropy.

[1] 李 洪,杨雁武. 中国电信集团电子运维系统整合研究[J]. 电信科学, 2009, 25(11): 74 -77 .
[2] 姜启广. TD-SCDMA与2G共址网络规划探讨[J]. 电信科学, 2009, 25(11): 81 -85 .
[3] 王侃. IDM技术发展与挑战[J]. 电信科学, 2009, 25(11): 88 -90 .
[4] 刘伯涛. 移动回传的融合之路[J]. 电信科学, 2009, 25(11): 91 -93 .
[5] 杜伟. IP RAN 承载网技术探讨[J]. 电信科学, 2009, 25(11): 93 -94 .
[6] 朱召胜. 传递PTN价值 构建移动回传绿色精品网络[J]. 电信科学, 2009, 25(11): 97 -101 .
[7] 孙毓明,毛拥华. 移动网络演进及其对传送网络的影响[J]. 电信科学, 2009, 25(11): 102 -104 .
[8] 金家德. PTN力助运营商IP RAN建设步伐[J]. 电信科学, 2009, 25(11): 104 -105 .
[9] . 西南交通大学图书馆远程容灾备份系统的建设[J]. 电信科学, 2009, 25(11): 106 .
[10] . 联想服务器助力好耶广告网络邮件系统[J]. 电信科学, 2009, 25(11): 107 .