大数据 ›› 2015, Vol. 1 ›› Issue (1): 89-103.doi: 10.11959/j.issn.2096-0271.2015.01.008
钱卫宁,夏帆,周敏奇,金澈清,周傲英
修回日期:
2015-05-07
出版日期:
2015-05-20
发布日期:
2020-09-28
作者简介:
钱卫宁,男,华东师范大学数据科学与工程研究院教授、博士生导师,研究兴趣包括互联网环境下的数据管理、大数据管理系统评测基准、社交媒体数据分析、知识图谱构建与应用等。|夏帆,男,华东师范大学数据科学与工程研究院博士生,研究兴趣包括分布式查询处理、社交媒体数据基准测试、社交媒体数据管理。|周敏奇,男,华东师范大学数据科学与工程研究院副教授、硕士生导师,研究兴趣主要包括内存事务处理系统、内存分析处理系统、计算广告学。|金澈清,男,华东师范大学数据科学与工程研究院教授、博士生导师,研究兴趣主要包括基于位置的服务、数据流管理、不确定数据管理和数据基准评测。|周傲英,男,华东师范大学长江学者特聘教授、数据科学与工程研究院院长,研究兴趣主要包括Web数据管理、数据密集型计算、内存集群计算、分布事务处理、大数据基准测试和性能优化。
基金资助:
Weining qian,Fan Xia,Minqi Zhou,Cheqing Jin,Aoying Zhou
Revised:
2015-05-07
Online:
2015-05-20
Published:
2020-09-28
Supported by:
摘要:
数据库评测基准在数据库发展历史中的作用不可替代,而大数据环境中传统评测基准不敷应用。因此,从评测基准3要素,即数据、负载、度量体系入手,研究具有高仿真性、可适配性、可测量性的大数据管理系统评测基准,对大数据管理系统的研发和应用系统选型至关重要。基于此,在简要分析评测基准的基本要素和大数据管理系统发展过程的基础上,重点分析大数据管理系统的基准评测需求与挑战,然后通过社交媒体分析型查询评测基准BSMA,探讨了面向应用的大数据管理系统基准评测的设计和实现问题。
中图分类号:
钱卫宁, 夏帆, 周敏奇, 金澈清, 周傲英. 大数据管理系统评测基准的挑战与研究进展[J]. 大数据, 2015, 1(1): 89-103.
Weining qian, Fan Xia, Minqi Zhou, Cheqing Jin, Aoying Zhou. Challenges and Progress of Big Data Management System Benchmarks[J]. Big Data Research, 2015, 1(1): 89-103.
[1] | Gray J , Benchmark handbook for database and transaction system (2nd edition). San Francisco: Morgan Kaufmann, 1993 |
[2] | Bitton D , DeWitt D J , Turbyfil C . Benchmarking database systems: a systematic approach. Proceedings of the 9th VLDB Conference, Florence, Italy, 1983 |
[3] | Laney D . 3D Data Management:Controlling Data Volume, Velocity and Variety. Technical Report, Meta Group, 2011 |
[4] | Pavlo A , Paulson E E , Rasin A . et al. A comparison of approaches to large-scale data analysis. Proceedings of ACM SIGMOD/PODS Conference, Providence, Rhode Island, USA, 2009 |
[5] | Carey M J . BDMS performance evaluation:practices, pitfalls, and possibilities. Proceedings of the 4th TPC Technology Conference, Istanbul, Turkey, 2012 |
[6] | . Big Data. VLDB Database Summer School (China) Slides 2013 |
[7] | Stonebraker M . Technical perspective one size fits all: an idea whose time has come and gone. Communications of the ACM 2008,51(12) |
[8] | Ma H X , Wei J X , Qian W N , et al. On benchmarking online social media analytical queries. Proceedings of Graph Data-management Experiences &Systems, New York, USA 2013 |
[9] | Xia F , Li Y , Yu C C , et al. BSMA: A benchmark for analytical queries over social media data. Proceedings of the VLDB Endowment, 2014,7(13): 1573~1576 |
[10] | Yu C C , Fan X , Qian W N , et al. BSMA-Gen: a parallel synthetic data generator for social media timeline structures. Proceedings of ACM Sigcomm ’98, Vancouver Canada 2014 |
[11] | 金澈清, 钱卫宁, 周敏奇 等. 数据管理系统评测基准: 从传统数据库到新兴大数据. 计算机学报 2015,38(1): 18~34. |
Jin C Q , Qian W N , Zhou M Q . et al. Benchmarking data management systems:from traditional database to emergent big data Chinese Journal of Computers, 201538(1): 18~34. | |
[12] | Nambiar R , Wakou N , Masland A , et al. Shaping the landscape of industry standard benchmarks: contributions of the transaction processing performance council (TPC). Proceedings of the 3rd TPC Technology Conference, Seattle, Wa, USA, 2011 |
[13] | Bitton D , Brown M , Catell R , et al. A measure of transaction processing power. Datamation, 1985,31(7): 112~118. |
[14] | Turbyfill C , Orji C , Bitton D . AS3AP-An ANSI SQL Standard Scalable and Portable Benchmark for Relational Database Systems. Chapter 5, Benchmark handbook for database and transaction system (2nd edition).San Francisco: Morgan Kaufmann, 1993 |
[7] | O’Neil . Revisiting DBMS benchmarks. Datamation, 1989,35(9): 47~52. |
[16] | O’Neil P , O’Neil B , Chen X D . The Star Schema Benchmark (SSB). University of Massachusetls,Boston, 2007 |
[17] | Bog A . Benchmarking Transaction and Analytical Processing Systems: The Creation of a Mixed Workload Benchmark and Its Application. Berlin: Springer, 2013 |
[18] | Cattell R G G , Skeen J . Object operations benchmark. ACM Transactions on Database Systems, 1992,17(1): 1~31. |
[19] | Carey M J , DeWitt D J , Naughton J F . The OO7 benchmark. Proceedings of ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, Proceedings of ACM SIGMOD International Conference on Management of Data, 1993 |
[20] | Anderson T L , Berre A J , Mallison M , et al. The HyperModel benchmark. Proceedings of the 2nd International Conference on Extending Database Technology: Advances in Database Technology, Venice, Italy, 1990 |
[21] | Carey M J , DeWitt D J , Naughton J F , et al. The BUCKY object-relational benchmark. Proceedings of ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, 1997 |
[22] | Runapongsa K , Patel J M , Jagadish H V , et al. The Michigan benchmark: towards XML query performance diagnostics. Information Systems, 2006,31(2): 73-97. |
[23] | Yao B , Ozsu M T , Khandelwal N . XBench benchmark and performance testing of XML DBMSs. Proceedings of the 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, 2004 |
[24] | Bōhme T , Rahm E . Multi-user evaluation of XML data management systems with XMach-1. Proceedings of the Workshop on Efficiency and Effectiveness of XML Tools and Techniques (EEXTT), Heidelberg, Germany, 2002 |
[25] | Schmidt A , Waas F , Kersten M , et al. XMark: a benchmark for XML data management. Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002 |
[26] | Li Y , Bressan S , Dobbie G , et al. XOO7:applying OO7 benchmark to XML query processing tools. Proceedings of Conference on Information and Knowledge Management, Washington, DC, USA, 2001 |
[27] | Nicola M , Kogan I , Schiefer B . An XML transaction processing benchmark. Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Beijing, China, 2007 |
[28] | Werstein P . A performance benchmark for spatiotemporal databases. Proceedings of the 10th Annual Colloquium of the Spatial Information Research Centre, Dunedin, New Zealand, 1998 |
[29] | Myllymaki J , Kaufman J . DynaMark: a benchmark for dynamic spatial indexing. Proceedings of the 4th International Conference on Mobile Data Management, Melbourne, Australia, 2003 |
[30] | Jensen C , Tiesyte D , Tradisauskas N , et al. The COST benchmark-comparison and evaluation of spatio-temporal indexes. Proceedings of the 11th International Conference on Database Systems for Advanced Applications, Singapore, 2006 |
[31] | Düntgen C , Behr T , Güting R H , et al. BerlinMOD: a benchmark for moving object databases. The VLDB Journal, 2009,18(6): 1335-1368 |
[32] | Arasu A , Cherniack M , Galvez E , et al. Linear road: a stream data management benchmark. Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, 2004 |
[33] | Kim K , Jeon K , Han H , et al. MRBench:a benchmark for MapReduce framework. Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems, Melbourne, Victoria, Australia, 2008 |
[34] | White T . Hadoop权威指南(第二版). 周敏奇, 王晓玲, 金澈清 等译. 北京: 清华大学出版社, 2011 |
White T . Hadoop: The Definitive Guide.Translated by Zhou M Q, Wang X L, Jin C Q et al. Beijing: Tsinghua University Press, 2011 | |
[35] | Daniel . Pig mix. . |
[36] | Luo C , Zhan J , Jia Z et al. CloudRank-D:benchmarking and ranking cloud computing systems for data processing applications. Frontiers of Computer Science, 2012,6(4): 347~362. |
[37] | Cooper B , Silberstein A , Tam E . et al. Benchmarking cloud serving systems with YCSB. Proceedings of ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 2010 |
[38] | Patil S , Polte M , Ren K , et al. YCSB++:benchmarking and performance debugging advanced features in scalable table stores. Proceedings of ACM Symposium on Cloud Computing, Cascais, Portugal, 2011 |
[39] | Floratou AD , Teletia N , DeWitt D J , et al. Can the elephants handle the NoSQL onslaught. Proceedings of the VLDB Endowment, 2012,5(12): 1712-1723 |
[40] | Rabl T , Gómez-Villamor S , Sadoghi M , et al. Solving big data challenges for enterprise application performance management. Proceedings of the VLDB Endowment, 2012,5(12): 1724~1735. |
[41] | Ghazal A , Rabl T , Hu M , et al. BigBench:towards an industry standard benchmark for big data analytics. Proceedings of ACM SIGMOD/PODS Conference, New York, USA, 2013 |
[42] | Armstrong T G , Ponnekanti V , Borthakur D , et al. LinkBench: a database benchmark based on the Facebook social graph. Proceedings of the ACM SIGMOD/PODS Conference, New York, USA, 2013 |
[43] | Boncz P A , Fundulaki I , Gubichev A , et al. The linked data benchmark council project. Datenbank-Spektrum, 2013,13(2): 121-129. |
[44] | Jia Z , Wang L , Zhan J , et al. Characterizing data analysis workloads in data centers. Proceedings of IEEE International Symposium on Workload Characterization, Portland, OR, USA, 2013 |
[45] | Xi H F , Zhan J F , Zhan J , et al. Characterization of Real Workloads of Web Search Engines. Proceedings of IEEE International Symposium on Workload Characterization, Austin, TX , USA, 2011 |
[46] | Wang L , Zhan J F , Luo C J , et al. BigDataBench: a big data benchmark suite from internet services. Proceedings of the 24th IEEE International Symposium on High Perfornance Computer Architecture, Orlando, Florida, USA, 2014 |
[47] | Zhu Y , Zhan J . BigOP: generating comprehensive big data workloads as a benchmarking framework. Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Bali, Indonesia, 2014 |
[48] | 刘兵兵, 孟小峰, 史英杰 . CloudBM:云数据管理系统测试基准. 计算机科学与探索. 2012,6(6): 504~512. |
LUI B B , Meng X F , Shi Y J . CloudBM: a benchmark for cloud data management systems. Journal of Frontiers of Computer Science and Technology, 2012,6(6): 504~512. | |
[49] | 付长冬, 舒继武, 沈美明 等. 网络存储系统性能基准的研究、评价与发展. 小型微型计算机系统, 2004,25(12): 2049~2054. |
Fu C D , Shu J W , Shen M M . et al. Evaluation, research and development of performance benchmark on network storage system. Journal of Chinese Computer Systems, 2012,25(12): 2049-2054. | |
[50] | 刘大为, 栾华, 王珊 等. 内存数据库在TPC-H负载下的处理器性能. 软件学报, 2008,19(10): 2574-2584. |
Liu D W , Luan H , Wang S . et al. Main memory database TPC-H workload characterization on modern processor. Journal of Software, 2008,19(10): 2574-2584. | |
[51] | Kang Q Q , Jin C Q , Zhang Z . et al. MemTest: a novel benchmark for in-memory database. Proceedings of the 5th Workshop on Big Data Benchmarks. Proceedings of the 5th Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Hangzhou, China, 2014 |
[52] | Zhao H W , Ye X J . A practice of TPC-DS multidimensional implementation on NoSQL database systems. Proceedings of the 5th TPC Technology Conference, Trento, Italy, 2013 |
[53] | 赵博 叶晓俊 . OLAP性能测试方法研究与实现. 计算机研究与发展, 2011,48(10): 1951~1959. |
Zhao B , Ye X J . Study and implementation of OLAP performance benchmark. Journal of Computer Research and Development, 2011,48(10): 1951-1959. | |
[54] | 叶晓俊, 王建民 . DBMS性能评价指标体系. 计算机研究与发展, 2009,46(增刊): 313~318. |
Ye X J , Wang J M . DBMS performance evaluation indicators. Journal of Computer Research and Development, 2009,46(suppl.): 313~318. | |
[55] | Ning F F , Weng C L , Luo Y . Virtualization I/O optimization based on shared memory. Proceedings of the IEEE International Conference on Big Data, Santa Clara, USA, 2013 |
[56] | Chen P , Qi Y , Li X , et al. An ensemble MIC-based approach for performance diagnosis in big data platform. Proceedings of the IEEE International Conference on Big Data, Santa Clara, USA, 2013 |
[57] | Gu L , Zhou M Q , Zhang Z J , et al. Chronos: an elastic parallel framework for stream benchmark generation and simulation. Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, Korea, 2015 |
[58] | Du N Q , Ye X J , Wang J M , . Towards workflow-driven database system workload modeling. Proceedings of the 2nd International Workshop on Testing Database Systems, Providence, Rhode Island, USA, 2009 |
[1] | 李爱黎, 张子帅, 林荫, 王秋菊, 杨建安, 孟炜程, 张岩峰. 基于社交网络大数据的民众情感监测研究[J]. 大数据, 2022, 8(6): 105-126. |
[2] | 夏正勋, 唐剑飞, 罗圣美, 张燕. 可信AI治理框架探索与实践[J]. 大数据, 2022, 8(4): 145-164. |
[3] | 夏正勋,杨一帆,罗圣美,赵大超,张燕,唐剑飞. 生成技术在人工智能平台中的应用探索[J]. 大数据, 2020, 6(6): 0-. |
[4] | 夏正勋, 罗圣美, 孙元浩, 唐剑飞, 张燕. 大规模异构数据并行处理系统的设计、实现与实践[J]. 大数据, 2020, 6(4): 18-29. |
[5] | 张永锋, 霍东云, 李振华, 智强, 李燕茜. 学术大数据在企业专家对接中的应用[J]. 大数据, 2019, 5(5): 79-88. |
[6] | 李毓瑞, 陈红梅, 王丽珍, 肖清. 基于密度的停留点识别方法[J]. 大数据, 2018, 4(5): 80-93. |
[7] | 覃雄派, 陈跃国, 杜小勇, 王伟娟. “数据科学概论”课程设计[J]. 大数据, 2017, 3(6): 102-111. |
[8] | 阮彤, 高炬, 冯东雷, 钱夕元, 王婷, 孙程琳. 基于电子病历的临床医疗大数据挖掘流程与方法[J]. 大数据, 2017, 3(5): 83-98. |
[9] | 姚前, 谢华美, 刘松灵, 李香菊, 刘新海, 景志刚. 基于征信数据观中国近10年产业间信贷资源的调整路径[J]. 大数据, 2017, 3(1): 35-43. |
[10] | 王文生, Leifeng. 关于我国农业大数据中心建设的设想[J]. 大数据, 2016, 2(1): 28-34. |
[11] | . 2015中国大数据技术大会在北京隆重开幕[J]. 大数据, 2016, 2(1): 120-120. |
[12] | 海沫, 牛怡晗, 张悦今. 面向大数据的并行聚类算法在股票板块划分中的应用[J]. 大数据, 2015, 1(4): 9-17. |
[13] | 林春雨, 李崇纲, 许方圆, 许会泉, 石磊, 卢祥虎. 基于大数据技术的P2P网贷平台风险预警模型[J]. 大数据, 2015, 1(4): 18-28. |
[14] | 许洪波, 陈波. 面向国防安全的网络大数据分析与应用系统[J]. 大数据, 2015, 1(4): 29-37. |
[15] | 周涛. 基于统计学习的网络异常行为检测技术[J]. 大数据, 2015, 1(4): 38-47. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|