电信科学 ›› 2016, Vol. 32 ›› Issue (4): 169-174.doi: 10.11959/j.issn.1000-0801.2016104

• 电力信息化专栏 • 上一篇    下一篇

分布式数据质量管理系统在电力企业的实践和应用

李远宁,刘森,张诗军,陈丰,王志英   

  1. 中国南方电网有限责任公司信息部,广东 广州 510623
  • 出版日期:2016-04-20 发布日期:2016-04-28

Practice and application of distributed data quality management system in power enterprise

Yuanning LI,Sen LIU,Shijun ZHANG,Feng CHEN,Zhiying WANG   

  1. Information Department of China Southern Power Grid Co.,Ltd.,Guangzhou 510623,China
  • Online:2016-04-20 Published:2016-04-28

摘要:

随着企业信息化水平和企业精细化管理要求的不断提高,企业对数据管理的需求也随之增强,如何提高企业数据质量更是需要重点解决的问题。针对电力企业数据质量管理面临的挑战,创新提出了分布式数据质量管理解决方案。针对集中式数据质量系统的性能瓶颈,在研究数据质量系统特点并借鉴国内外对大数据的解决方案后,提出了基于Hadoop分布式处理框架的解决方案。利用Hadoop集群,可以把缺陷数据从Oracle中抽离,分散存储在集群里多台服务器上,以有效提高磁盘I/O性能和数据分析性能。

关键词: 数据质量管理, 分布式, Hadoop

Abstract:

As the improvement of the enterprise’s informationalization level and the increasing management requirement of enterprise refinement,the demand of data management of enterprise is becoming greater and greater,how to improve the data quality of the enterprise is the key problem needed to be solved. Aiming at the challenges of data quality management that the power enterprise faces,some solutions for distributed data quality management were proposed. After researching the system features of data quality,some foreign and domestic cases of big data were analyzed as reference,and a solution based on Hadoop distributed processing framework was given to solve the performance bottleneck of centralized data quality system. Hadoop clustering could dissociate defect data from Oracle and the data would be stored separately on multiple servers of the clustering,which could improve the I/O performance and data analysis performance of the magnetic disk effectively.

Key words: data quality management, distributed, Hadoop

No Suggested Reading articles found!