大数据 ›› 2020, Vol. 6 ›› Issue (2): 41-56.doi: 10.11959/j.issn.2096-0271.2020013
吴维刚1,常亮2,任江涛1,古天龙2
出版日期:
2020-03-15
发布日期:
2020-03-21
作者简介:
吴维刚(1976- ),男,博士,中山大学数据科学与计算机学院教授、博士生导师,广东省医疗大数据工程技术研究中心副主任、广州市超算与大数据重点实验室副主任,主要研究方向为云计算与边缘计算、大数据与深度学习、分布式共识与区块链等|常亮(1980- ),男,博士,桂林电子科技大学计算机与信息安全学院院长,中国计算机学会高级会员,主要研究方向为数据与知识工程、形式化方法、智能系统|任江涛(1975- ),男,博士,中山大学数据科学与计算机学院副教授,中国计算机学会会员,主要研究方向为数据挖掘、机器学习与自然语言处理|古天龙(1964- ),男,博士,桂林电子科技大学计算机与信息安全学院教授、博士生导师,国家百千万人才工程人选,教育部高等学校计算机类专业教学指导委员会副主任委员,中国人工智能学会离散智能计算专业委员会主任委员、人工智能教育工作委员会副主任委员,卫星导航定位与位置服务国家地方联合工程研究中心主任,主要研究方向为知识工程与大数据、人工智能伦理、形式化方法等
基金资助:
Weigang WU1,Liang CHANG2,Jiangtao REN1,Tianlong GU2
Online:
2020-03-15
Published:
2020-03-21
Supported by:
摘要:
大数据处理系统是未来社会的基础设施之一。政府治理场景下的大数据处理任务具有多域异构、多主体等特点,因此需要针对性地进行研究设计。从应用需求出发,分析各类政府治理场景对大数据处理技术提出的挑战,梳理大数据分布并行处理的关键技术,包括数据存储管理、计算平台、关键算法等,调研总结相关技术的研究现状,并提出面向政府治理大数据的高性能计算系统的技术框架,分析讨论不同技术路线的优劣。最后展望相关技术的未来发展趋势。
中图分类号:
吴维刚, 常亮, 任江涛, 古天龙. 面向政府治理大数据的高性能计算系统[J]. 大数据, 2020, 6(2): 41-56.
Weigang WU, Liang CHANG, Jiangtao REN, Tianlong GU. High performance big data computing systems for government governance[J]. Big Data Research, 2020, 6(2): 41-56.
[1] | NUAIMI E , NEYADI H , MOHAMED N ,et al. Applications of big data to smart cities[J]. Journal of Internet Services and Applications, 2015,6(1): 1-15. |
[2] | DAHL G E , YU D , DENG L ,et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2012,20(1): 30-42. |
[3] | JUN C , CHUNG C . Big data analysis of local government 3.0:focusing on Gyeongsangbuk-do in Korea[J]. Technological Forecasting & Social Change, 2016,110: 3-12. |
[4] | 夏火松, 甄化春 . 大数据环境下舆情分析与决策支持研究文献综述[J]. 情报杂志, 2015(2): 1-6. |
XIA H S , ZHEN H C . Public opinion analysis and decision support study under big data surroundings[J]. Journal of Intelligence, 2015(2): 1-6. | |
[5] | JIANG D , LEUNG K , NG W . Fast topic discovery from web search streams[C]// The 23rd International Conference on World Wide Web,April 7-11,2014,Seoul,Korea. New York:ACM Press, 2014: 949-960. |
[6] | 王登峰 . 网络舆情事件热点发现的算法比较分析[J]. 信息通信, 2014(2): 32-34. |
WANG D F . Algorithm analysis on network public opinion hotspot detection[J]. Information & Communications, 2014(2): 32-34. | |
[7] | RATHORE M , AHMAD A , PAUL A ,et al. Urban planning and building smart cities based on the Internet of things using big data analytics[J]. The International Journal of Computer and Telecommunications Networking, 2016(101): 63-80. |
[8] | KITCHIN R . The real-time city? big data and smart urbanism[J]. Geo Journal, 2014,79: 1-14. |
[9] | BAHL P , PADMANABHAN V N . RADAR:an in-building RF-based user location and tracking system[C]// IEEE INFOCOM 2000,March 26-30,2000,Tel Aviv,Israel. Piscataway:IEEE Press, 2000: 775-784. |
[10] | ZHAO F , ZHOU J , NIE C ,et al. SmartCrawler:a two-stage crawler for efficiently harvesting deep-web interfaces[J]. IEEE Transactions on Services Computing, 2016,9(4): 608-620. |
[11] | LIAKOS P , NTOULAS A , LABRINIDIS A ,et al. Focused crawling for the hidden web[J]. World Wide Web, 2016(19): 605-636. |
[12] | LIU W , MENG X F , MENG W Y . ViDE:a vision-based approach for deep web data extraction[J]. IEEE Transactions on Knowledge and Data Engineering, 2010,22(3): 447-460. |
[13] | YU X , BEZERRA G , PAVLO A ,et al. Staring into the abyss:an evaluation of concurrency control with one thousand cores[J]. VLDB Endowment, 2014,8(3): 209-220. |
[14] | HARDING R , AKEN D V , PAVLO A ,et al. An evaluation of distributed concurrency control[J]. VLDB Endowment, 2017,10(5): 553-564. |
[15] | LAKSHMAN S , MELKOTE S , LIANG J ,et al. Nitro:a fast,scalable in-memory storage engine for NoSQL global secondary index[J]. VLDB Endowment, 2013,9(13): 1413-1424. |
[16] | DIEGUES N , ROMANO P . STI-BT:a scalable transactional index[J]. IEEE Transactions on Parallel and Distributed Systems, 2016,27(8): 2408-2421. |
[17] | CHU X , ILYAS I . Qualitative data cleaning[J]. VLDB Endowment, 2016(9): 1605-1608. |
[18] | HELLERSTEIN J M . Quantitative data cleaning for large databases[R]. United Nations Economic Commission for Europe (UNECE), 2008. |
[19] | FAN W , GEERTS F , JIA X . Semandaq:a data quality system based on conditional functional dependencies[J]. VLDB Endowment, 2008: 1460-1463. |
[20] | FAN W , GEERTS F , JIA X ,et al. Conditional functional dependencies for capturing data inconsistencies[J]. ACM Transactions on Database Systems, 2008,33(2): 1-48. |
[21] | CHIANG F , MILLER R . Discovering data quality rules[J]. VLDB Endowment, 2008(8): 1166-1177. |
[22] | JIN C , LALL A , XU J ,et al. Distributed error estimation of functional dependency[J]. Information Sciences, 2016,345: 156-176. |
[23] | QUE X , WANG Y , XU C ,et al. Hierarchical merge for scalable MapReduce[C]// Proceedings of the 2012 Workshop on Management of Big Data Systems,September 21,2012,San Jose,USA. New York:ACM Press, 2012: 1-6. |
[24] | MICHEAL S , THOTA A , HENSCHEL R . HPCHadoop:a framework to run Hadoop on Cray X-series supercomputers[C]// Cray User Group Meeting 2014,May 4-8,2014,Lugano,Switzerland.[S.l.:s.n]. 2014. |
[25] | WANG W , WU Q , TAN Y ,et al. Optimizing the MapReduce framework for CPUMIC heterogeneous cluster[M]. Berlin: Springer International PublishingPress, 2015. |
[26] | HOEFLER T , LUMSDAINE A , DONGARRA J . Towards efficient MapReduce using MPI[C]// The 16th European PVM/MPI Users’ Group Meeting,September 7-10,2009,Espoo,Finland. Berlin:SpringerVerlag, 2009: 240-249. |
[27] | MOHAMED H , MARCHAND-MAILLET S . Distributed media indexing based on MPI and MapReduce[J]. Multimedia Tools and Applications, 2014,69(2): 513-537. |
[28] | RAINA R , MADHAVAN A , NG A Y . Largescale deep unsupervised learning using graphics processors[C]// The 26th Annual International Conference on Machine Learning,June 14-18,2009,Montreal,Canada. New York:ACM Press, 2009: 873-880. |
[29] | SHALEV-SHWARTZ S , SINGER Y , SREBRO N ,et al. Pegasos:primal estimated sub-gradient solver for SVM[J]. Mathematical Programming, 2011,127(1): 3-30. |
[30] | HAZAN E , RAKHLIN A , BARTLETT P L . Adaptive online gradient descent[C]// The 20th International Conference on Neural Information Processing Systems,December 3-6,2007,Vancouver,Canada. New York:ACM Press, 2007: 65-72. |
[31] | LIU D C , NOCEDAL J . On the limited memory BFGS method for large scale optimization[J]. Mathematical Programming, 1989,45(3): 503-528. |
[32] | LE Q V , NGIAM J , COATES A ,et al. On optimization methods for deep learning[C]// The 28th International Conference on Machine Learning,June 28 - July 2,2011,Bellevue,USA.[S.l.:s.n]. 2011. |
[33] | OWENS J D , HOUSTON M , LUEBKE D ,et al. GPU computing[J]. Proceedings of the IEEE, 2008,96(5): 879-899. |
[34] | JIN L , WANG Z , GU R ,et al. Training large scale deep neural networks on the Intel Xeon Phi many-core coprocessor[C]// 2014 IEEE International Parallel & Distributed Processing Symposium Workshops,May 19-23,2014,Phoenix,USA. Piscataway:IEEE Press, 2014: 1622-1630. |
[35] | VIEBKE A , PLLANA S . The potential of the Intel (R) Xeon Phi for supervised deep learning[C]// 2015 IEEE 17th International Conference on High Performance Computing and Communications,August 24-26,2015,New York,USA. Piscataway:IEEE Press, 2015: 758-765. |
[36] | XIA L , TANG T , HUANGFU W ,et al. Switched by input:power efficient structure for RRAM-based convolutional neural network[C]// 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC),June 5-9,2016,Austin,USA. Piscataway:IEEE Press, 2016: 1-6. |
[37] | BOJNORDI M N , IPEK E . Memristive Boltzmann machine:a hardware accelerator for combinatorial optimization and deep learning[C]// 2016 IEEE International Symposium on High Performance Computer Architecture(HPCA),March 12-16,2016,Barcelona,Spain. Piscataway:IEEE Press, 2016: 1-13. |
[38] | YE S , TANG Y H , LIU H Z ,et al. Research on algorithm optimization of Graph500 benchmark program[C]// The 19th Annual Conference on Computer Engineering and Technology and the 5th Forum on Microprocessor Technology,August 6,2015,Harbin,China. Hunan:Hunan Science & Technology Press, 2015: 64-71. |
[39] | PICHIORRI F , SUH S S , ROCCI A ,et al. Scalable graph exploration on multicore processors[J]. International Communications in Heat & Mass Transfer, 2010,39(7): 937-944. |
[40] | BEAMER S , ASANOVIC K , PATTERSON D A . Searching for a parent instead of fighting over children:a fast breadth-first search implementation for graph500[D]. Berkeley:University of California, 2011. |
[41] | YASUI Y , FUJISAWA K , GOTO K . NUMA-optimized parallel breadthfirst search on multicore single-node system[C]// 2013 IEEE International Conference on Big Data,October 6-9,2013,Silicon Valley,USA. Piscataway:IEEE Press, 2013: 394-402. |
[42] | YASUI Y , FUJISAWA K , SATO Y . Fast and energy-efficient breadth-first search on a single NUMA system[M]. Berlin: SpringerPress, 2014. |
[43] | YOO A , CHOW E , HENDERSON K ,et al. A scalable distributed parallel breadthfirst search algorithm on BlueGene/L[C]// The 2005 ACM/IEEE Conference on Supercomputing,November 12-18,2005,Seattle,USA. Piscataway:IEEE Press, 2005: 25-25. |
[44] | MIZELL D , MASCHHOFF K . Early experiences with large-scale Cray XMT systems[C]// 2009 IEEE International Symposium on Parallel & Distributed Processing,May 23-29,2009,Rome,Italy. Piscataway:IEEE Press, 2009: 1-9. |
[45] | UENO K , SUZUMURA T . Parallel distributed breadth first search on GPU[C]// The 20th Annual International Conference on High Performance Computing,December 10-21,2013,Bangalore,India. Piscataway:IEEE Press, 2013: 314-323. |
[46] | WADLEIGH K , AMELIO J , COLLINS K ,et al. Abstract:hybrid breadth first search implementation for hybrid-core computers[C]// 2012 SC Companion:High Performance Computing,Networking,Storage and Analysis,November 10-16,2012,Salt Lake City,USA. Piscataway:IEEE Press, 2012:1354. |
[47] | FUENTES P , BOSQUE J L , BEIVIDE R ,et al. Characterizing the communication demands of the graph500 benchmark on a commodity cluster[C]// 2014 IEEE/ACM International Symposium on Big Data Computing,December 8-11,2014,London,UK. Piscataway:IEEE Press, 2014: 83-89. |
[48] | EISENMAN A , CHERKASOVA L , MAGALHAES G ,et al. Parallel graph processing:prejudice and state of the art[C]// The 7th ACM/SPEC on International Conference on Performance Engineering,March 12-16,2016,Delft,The Netherlands. New York:ACM Press, 2016: 85-90. |
[1] | 梅宏, 杜小勇, 金海, 程学旗, 柴云鹏, 石宣化, 靳小龙, 王亚沙, 刘驰. 大数据技术前瞻[J]. 大数据, 2023, 9(1): 1-20. |
[2] | 宫明, 蒋翔宇, 陈莹, 刘朝峰. 从格点量子色动力学应用看国产超算环境的基础软件[J]. 大数据, 2021, 7(5): 31-39. |
[3] | 张晨浩, 肖利民, 秦广军, 宋尧, 蒋世轩, 王继业. 面向大数据处理应用的广域存算协同调度系统[J]. 大数据, 2021, 7(5): 82-97. |
[4] | 于璠. 新一代深度学习框架研究[J]. 大数据, 2020, 6(4): 69-80. |
[5] | 邹骁锋, 阳王东, 容学成, 李肯立, 李克勤. 面向大数据处理的数据流编程模型和工具综述[J]. 大数据, 2020, 6(3): 59-72. |
[6] | 毕倪飞, 丁光耀, 陈启航, 徐辰, 周傲英. 数据流计算模型及其在大数据处理中的应用[J]. 大数据, 2020, 6(3): 73-86. |
[7] | 苏华友, 梅松竹, 李荣春, 窦勇. 数据流技术在GPU和大数据处理中的应用[J]. 大数据, 2020, 6(3): 117-128. |
[8] | 杨孟辉, 杜小勇. 政府大数据治理:政府管理的新形态[J]. 大数据, 2020, 6(2): 3-18. |
[9] | 金澈清, 陈晋川, 刘威, 张召. 政府治理大数据的共享、集成与融合[J]. 大数据, 2020, 6(2): 27-40. |
[10] | 李政, 洪莹. 基于隐私保护的政府大数据治理研究[J]. 大数据, 2020, 6(2): 69-82. |
[11] | 刘汪根, 郑淮城, 荣国平. 云环境下大规模分布式计算数据感知的调度系统[J]. 大数据, 2020, 6(1): 81-98. |
[12] | 牟少敏, 温孚江, 宋长青. 农业大数据研究生培养模式探索[J]. 大数据, 2016, 2(1): 53-58. |
[13] | 潘永花. 数据开放与政府治理创新[J]. 大数据, 2015, 1(2): 31-37. |
[14] | 陈文光. 大数据与高性能计算[J]. 大数据, 2015, 1(1): 29-34. |
[15] | 黄宜华. 大数据机器学习系统研究进展[J]. 大数据, 2015, 1(1): 35-54. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|