电信科学 ›› 2015, Vol. 31 ›› Issue (4): 77-85.doi: 10.11959/j.issn.1000-0801.2015100

• 研究与开发 • 上一篇    下一篇

大数据中效用挖掘的快速单阶段算法

刘君强1,周青峰1,王文慧2,时磊1   

  1. 1 浙江工商大学 杭州 310018
    2 浙江水利水电学院 杭州 310018
  • 出版日期:2015-04-15 发布日期:2015-04-15
  • 基金资助:
    国家自然科学基金资助项目;浙江省自然科学基金资助项目

Fast Single Pbase Algoritbm for Utility Mining in Big Data

Junqiang Liu1,Qingfeng Zhou1,Wenhui Wang2,Lei Shi1   

  1. 1 Zhejiang Gongshang University,Hangzhou 310018,China
    2 Zhejiang University of Water Resources and Electric Power,Hangzhou 310018,China
  • Online:2015-04-15 Published:2015-04-15
  • Supported by:
    The National Natural Science Foundation of China;The Zhejiang Provincial Natural Science Foundation of China

摘要:

现有数据挖掘算法的缺点是在挖掘大数据时会出现大量候选模式,从而造成可伸缩性瓶颈,个别算法虽然不生成候选模式,但是计算代价高昂,缺乏有效剪裁,运行效率存在瓶颈。为此,提出一个全新的单阶段不生成候选模式的数据挖掘算法,其创新性有3点:一是基于前缀生长的模式枚举和基于效用上限值评估的剪裁策略;二是基于稀疏矩阵和虚拟投影的效用信息表达;三是节省存储空间的深度优先搜索方法。大量实验表明,新算法的时间效率比现有算法高5倍以上,并且内存使用量比现有算法少20%~60%,可伸缩性高。

关键词: 大数据, 效用挖掘, 高效用模式, 频繁模式

Abstract:

Most of the latest works on utility mining generates a huge number of candidates in dealing with big data,which suffers from the scalability issue.Some work does not generate candidates,but suffers from the efficiency issue due to lack of strong pruning and high computation overhead.A novel algorithm that finds high utility patterns in a single phase without generating candidates was proposed.The novelties lie in a prefix growth strategy with strong pruning,and a sparse matrix based representation of transactions with pseudo projection.The proposed algorithm works in a depth first manner and does not materialize high utility patterns in memory,which further improves the scalability.Extensive experiments on synthetic and rea1-world data show that the proposed algorithm outperforms the latest works in terms of running time,memory overhead,and scalability.

Key words: big data, utility mining, high utility pattern, frequent pattern

No Suggested Reading articles found!