电信科学 ›› 2017, Vol. 33 ›› Issue (1): 135-142.doi: 10.11959/j.issn.1000-0801.2017010

• 运营技术广角 • 上一篇    下一篇

基于Hadoop的电信大数据采集方案研究与实现

汪保友1,钱晶1,袁时金2   

  1. 1 中国联合网络通信有限公司上海市分公司,上海 200050
    2 同济大学软件学院,上海 201804
  • 修回日期:2017-01-03 出版日期:2017-01-01 发布日期:2017-06-04
  • 作者简介:汪保友(1968-),男,博士,中国联合网络通信有限公司上海市分公司高级工程师,主要研究方向为数据科学、数据挖掘、数据签名。|钱晶(1970-),女,中国联合网络通信有限公司上海市分公司工程师,主要研究方向为数据科学、移动互联网、通信网络规划。|袁时金(1975-),女,博士,同济大学软件学院副教授,主要研究方向为大数据与高性能计算。

Research and implementation on acquisition scheme of telecom big data based on Hadoop

Baoyou WANG1,Jing QIAN1,Shijin YUAN2   

  1. 1 Shanghai Branch of China United Network Communication Co.,Ltd.,Shanghai 200050,China
    2 School of Software Engineering,Tongji University,Shanghai 201804,China
  • Revised:2017-01-03 Online:2017-01-01 Published:2017-06-04

摘要:

ETL是数据仓库实施过程中一个非常重要的步骤,设计一个能够对大数据进行有效处理的ETL流程以提高运营平台的采集效率,具有重要的实际意义。首先简单介绍某运营商大数据平台采集的主要数据内容。随后,为提升海量数据采集效率,提出了Hadoop与Oracle混搭架构解决方案。继而,提出一种动态触发式ETL调度流程与算法,与定时启动的ETL流程调度方式相比,可有效缩短部分流程的超长等待时间;有效避免资源抢占拥堵现象。最后,根据Hadoop和Oracle的系统运行日志,比较分析了两个平台的采集效率与数据量之间的关系。实践表明,混搭架构的大数据平台优势互补,可有效提升数据采集时效性,获得比较好的应用效果。

关键词: 大数据, ETL, Hadoop, 调度流程, 混搭架构

Abstract:

ETL is a very important step in the implementation process of data warehouse.A good ETL flow is important,which can effectively process the telecom big data and improve the acquisition efficiency of the operation platform.Firstly,the main data content of the big data platform was expounded.Secondly,in order to improve the efficiency of massive data collection,Hadoop and Oracle mashup solution was suggested.Subsequently,a dynamic triggered ETL scheduling flow and algorithm was proposed.Compared with timer start ETL scheduling method,it could effectively shorten waiting time and avoid the phenomenon of resources to seize and congestion.Finally,according to the running log of Hadoop platform and Oracle database,the relationship between acquisition efficiency and data quantity was analyzed comparatively.Furthermore,practice result shows that the hybrid data structure of the big data platform complement each other and can effectively enhance the timeliness of data collection and access better application effect.

Key words: big data, ETL, Hadoop, scheduling process, mashup architecture

中图分类号: 

No Suggested Reading articles found!