电信科学 ›› 2013, Vol. 29 ›› Issue (10): 31-37.doi: 10.3969/j.issn.1000-0801.2013.10.007

• 云计算专栏 • 上一篇    下一篇

CC-MRSJ:Hadoop平台下缓存敏感的星型联接算法

周国亮1,2,朱永利1,王桂兰1   

  1. 1 华北电力大学控制与计算机工程学院 保定071003
    2 国网冀北电力有限公司技能培训中心 保定071051
  • 出版日期:2013-10-15 发布日期:2017-06-19
  • 基金资助:
    中央高校基本科研业务费专项基金资助项目;河北省高等学校科学研究基金资助项目

CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform

Guoliang Zhou1,2,Yongli Zhu1,Guilan Wang1   

  1. 1 College of Control and Computer Engineering,North China Electric Power University,Baoding 071003,China
    2 Skill Training Center,State Grid Jibei Electric Power Company Limited,Baoding 071051,China
  • Online:2013-10-15 Published:2017-06-19

摘要:

提出了一种缓存敏感的MapReduce 星型联接算法,事实表每列单独存储,维表根据维层次划分为多个列簇。事实表外键列与对应维表采用相关性存储,减少联接过程中的数据移动。算法分为两个阶段,首先每个外键列和对应维表进行联接;然后对中间结果进行联接,随机访问测度列,进而得到最终结果。算法只读取需要的数据,缓存利用率高,从而具有良好的缓存敏感特性;算法充分利用时延实体化,避免不必要的数据访问和移动。通过在SSB数据集上与Hive系统的对比测试表明,CC-MRSJ算法具有较高的执行效率。

关键词: 星型联接, MapReduce, 缓存敏感, 存储模型

Abstract:

A cache-conscious MapReduce star join algorithm was presented,each column of fact table was separately stored,and dimension table was divided into several column families according to dimension hierarchy.Fact table foreign key column and corresponding dimension table was co-location storage,thus reducing data movement in the join process.CC-MRSJ consists of two phases:firstly each foreign key column and the corresponding dimension table were joined; then the intermediate results were joined and random accessed measure columns,and so got the final result.CC-MRSJ read only the data needed,and cache utilization is high,so it has good cache conscious feature; it also takes advantage of late materialization,avoiding unnecessary data access and movement.CC-MRSJ has higher performance comparing to hive system based on SSB datasets.

Key words: star join, MapReduce, cache conscious, storage model

No Suggested Reading articles found!