Telecommunications Science ›› 2013, Vol. 29 ›› Issue (12): 1-8.doi: 10.3969/j.issn.1000-0801.2013.12.001

• research and development •     Next Articles

A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework

Bin Wu,Xinguang Liu   

  1. Telecommunication and Software Engineering Center, School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2013-12-20 Published:2017-07-04

Abstract:

The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.

Key words: improved chain-MapReduce, ETL, optimization rule

No Suggested Reading articles found!