Telecommunications Science ›› 2013, Vol. 29 ›› Issue (10): 31-37.doi: 10.3969/j.issn.1000-0801.2013.10.007

• Cloud computing column • Previous Articles     Next Articles

CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform

Guoliang Zhou1,2,Yongli Zhu1,Guilan Wang1   

  1. 1 College of Control and Computer Engineering,North China Electric Power University,Baoding 071003,China
    2 Skill Training Center,State Grid Jibei Electric Power Company Limited,Baoding 071051,China
  • Online:2013-10-15 Published:2017-06-19

Abstract:

A cache-conscious MapReduce star join algorithm was presented,each column of fact table was separately stored,and dimension table was divided into several column families according to dimension hierarchy.Fact table foreign key column and corresponding dimension table was co-location storage,thus reducing data movement in the join process.CC-MRSJ consists of two phases:firstly each foreign key column and the corresponding dimension table were joined; then the intermediate results were joined and random accessed measure columns,and so got the final result.CC-MRSJ read only the data needed,and cache utilization is high,so it has good cache conscious feature; it also takes advantage of late materialization,avoiding unnecessary data access and movement.CC-MRSJ has higher performance comparing to hive system based on SSB datasets.

Key words: star join, MapReduce, cache conscious, storage model

No Suggested Reading articles found!