通信学报 ›› 2016, Vol. 37 ›› Issue (1): 61-75.doi: 10.11959/j.issn.1000-436x.2016008
廖彬1,张陶2,于炯3,刘继1,尹路通3,郭刚3
出版日期:
2016-01-25
发布日期:
2016-01-27
基金资助:
Bin LIAO1,Tao ZHANG2,Jiong YU3,Ji LIU1,tong YINLu3,Gang GUO3
Online:
2016-01-25
Published:
2016-01-27
Supported by:
摘要:
现有的FIFO、Fair、Capacity、LATE及Deadline Constraint等MapReduce任务调度器的主要区别在于队列与作业选择策略的不同,而任务选择策略基本相同,都是将数据的本地性(data-locality)作为选择的主要因素,忽略了对TaskTracker 当前温度状态的考虑。实验表明,当TaskTracker处于高温状态时,一方面使CPU利用率变高,导致节点能耗增大,任务处理速度下降,导致任务完成时间增加;另一方面,易发的宕机现象将直接导致任务的失败,推测执行(speculative execution)机制容易使运行时任务被迫中止。继而提出温度感知的节能任务调度策略,将节点 CPU 温度纳入任务调度的决策信息,以避免少数高温任务执行节点对作业整体进度的影响。实验结果表明,算法能够避免任务分配到高温节点,从而有效地缩短作业完成时间,减小作业执行能耗,提高系统稳定性。
廖彬,张陶,于炯,刘继,尹路通,郭刚. 温度感知的MapReduce节能任务调度策略[J]. 通信学报, 2016, 37(1): 61-75.
Bin LIAO,Tao ZHANG,Jiong YU,Ji LIU,tong YINLu,Gang GUO. Temperature aware energy-efficient task scheduling strategies for mapreduce[J]. Journal on Communications, 2016, 37(1): 61-75.
表1
总体实验环境描述"
项目 | 描述 |
操作系统 | Debian 7.0 |
Java 版本 | 1.6 for Linux |
Hadoop | 1.0.4 |
能耗数据测量 | 北电电力监测仪(USB智能版),标准为GB/T17215-2003,功率误差值±0.01~0.1 W,采样频率为1.5~3s,单位为kWh |
能耗数据采集 | 电力监测仪用电监测管理系统 V1.0.1 |
能耗相关单位 | 功率/,能耗W /J |
数据采样频率 | 1s采集数据1次 |
节点CUP | Intel core2 duo E8400 3.00 GHz |
节点内存 | 2 GB-DDR2-800 MHz |
节点硬盘 | Hitachi HDP725032GLA380(320G 7 200转/秒) |
网卡信息 | Realtek RTL8168/8111 PCI-E Gigabit Ethernet NIC-100 Mbit/s |
表2
作业类型说明"
名称 | 参数配置 |
WordCount | 数据总量为976.5 MB,Map任务数为1,Reduce任务数为1 |
TeraSort | 数据总量为954.8 MB,Map任务数为1,Reduce任务数为1 |
NutchIndex | Page数量为100 000, Map任务数为8,Reduce任务数为1 |
K-means | Cluster数为5, Sample数为4 000 000,每个Input文件中的Sample数为4 000 000, dimensions大小为20,最大迭代次数为1,Map任务数为1,Reduce任务数为1 |
PageRank | Page数目为300 000,迭代次数设置为3,参数Block与Block_width分别设置为0与16,Map任务数为1,Reduce任务数为1 |
表3
高温与常温任务执行节点CPU利用率及功耗的对比"
作业 | 常温Map阶段CPU平均利用率/% | 常温Map任务平均功率/W | 高温Map阶段CPU平均利用率/% | 高温Map任务平均功率/W | 常温Reduce阶段CPU平均利用率/% | 常温Reduce任务平均功率/W | 高温Reduce阶段CPU平均利用率/% | 高温Reduce任务平均功率/W |
WordCount | 59.608 | 89.526 | 63.12 | 100.49 | 19.05 | 71.2 | 22.14 | 78.94 |
TeraSort | 60.24 | 83.4 | 76.47 | 92.9 | 59.66 | 82.75 | 77.85 | 95.71 |
NutchIndex | 70.54 | 89 | 72.77 | 103.5 | 65.56 | 90.5 | 68.7 | 102.2 |
K-means_Cluster Iterator | 55.2 | 77.46 | 77.88 | 103.75 | 54.47 | 77.47 | 83.17 | 108.66 |
K-means_Cluster Classification | 57.3 | 81.2 | 58.4 | 89.9 | N/A | N/A | N/A | N/A |
PageRank_Stage1 | 56.24 | 83.5 | 71.17 | 102.7 | 50.4 | 80 | 77.4 | 107.5 |
PageRank_Stage2 | 57.78 | 84 | 66.2 | 101.26 | 47.6 | 89.42 | 55.53 | 101.83 |
表5
高温与常温任务执行节点任务完成时间及计算能力的对比"
作业 | 常温Job运行时间/s | 高温Job运行时间/s | 常温Map任务计算能力 | 常温Map运行时间/s | 高温Map任务计算能力 | 高温Map运行时间/s | 常温Reduce任务计算能力常温Reduce运行时间/s | 高温Reduce任务计算能力 | 高温Reduce运行时间/s |
WordCount | 754 | 772 | 1.349 8 MB/s | 723 | 1.306 4 MB/s | 747 | 81.323 3 MB/s12 | 54.215 5 MB/s | 18 |
TeraSort | 259 | 295 | 5.96 MB/s | 160 | 5.046 MB/s | 189 | 5.36 MB/s178 | 4.94 MB/s | 193 |
NutchIndex | 489 | 553 | 436.68 page/s | 229 | 386.1 page/s | 259 | 233.1 page/s429 | 207.9 page/s | 481 |
Iterator K-means_Cluster | 662 | 715 | 5 797.1 sample/s | 573 | 6 980.8 sample/s | 690 | 7 130.1 sam-ple/s447 | 8 948.55 sample/s | 561 |
K-means_Cluster Classification | 674 | 678 | 7 619 sample/s | 525 | 6 042 sample/s | 662 | N/AN/A | N/A | N/A |
PageRank_Stage1 | 246 | 277 | 3 333.333 3 page/s | 87 | 3 448.28 page/s | 90 | 1 685.393 page/s153 | 1 960.8 page/s | 178 |
PageRank_Stage2 | 143 | 161 | 3 333.333 3 page/s | 90 | 3 225.8 page/s | 93 | 7 692.3 page/s39 | 5 882.35 page/s | 51 |
表6
作业类型说明"
名称 | 参数配置 |
WordCount | 数据总量为9 759.6 MB,Map任务数为8,Reduce任务数为8 |
TeraSort | 数据总量为9 536.7 MB,Map任务数为8,Reduce任务数为8 |
NutchIndex | Page数量为1 000 000,Map任务数为80,Reduce任务数为8 |
K-means | Cluster数为5,Sample数为40 000 000,每个Input文件中的Sample数为4 000 000,dimensions大小为20,最大迭代次数为1, Map任务数为10,Reduce任务数为1 |
Bayes | Page数目为50 000,分类数目为100,参数ngrams=3,Map任务数为10,Reduce任务数为1 |
PageRank | Page数目为3 000 000,迭代次数设置为3,参数Block与Block_width分别设置为0与16,Map任务数为10,Reduce任务数为1 |
表7
作业完成时间及能耗对比"
作业名称 | TempA-FIFO运行时间/s | Org-FIFO运行时间/s | TempA-Capacity运行时间/s | Org-FIFO运行时间/s | TempA-FIFO作业总能耗/J | Org-FIFO作业总能耗/J | TempA-Capacity作业总能耗/J | Org-Capacity作业总能耗/J |
WordCount | 891 | 921 | 821 | 872 | 675 876 | 704 566 | 612 785 | 671 478 |
TeraSort | 605 | 627 | 577 | 595 | 491 634 | 520 982 | 465 566 | 507 383 |
NutchIndex | 784 | 834 | 781 | 829 | 604 152 | 644 872 | 547 017 | 636 855 |
K-means | 432 | 449 | 405 | 424 | 321 056 | 338 647 | 316 653 | 336 508 |
PageRank | 544 | 626 | 551 | 612 | 424 276 | 468 362 | 404 434 | 439 103 |
Bayes | 482 | 559 | 450 | 528 | 373 424 | 433 798 | 345 252 | 415 854 |
[1] | 孟小峰, 慈祥 . 大数据管理: 概念、技术与挑战[J]. 计算机研究与发展, 2013,50(1):146-149. MENG X F , CI X . Big data management: concepts, techniques and challenges[J]. Journal of Computer Research and Development, 2013,50(1):146-149. |
[2] | GANTZ J , CHUTE C , MANFREDIZ A , et al. The diverse and ex-ploding digital universe: an updated forecast of worldwide information growth through[EB/OL]. . |
[3] | Global action plan, an inefficient truth[R/OL]. Global action plan report, 2007. . |
[4] | TIMES N Y . Power, pollution and the Internet[EB/OL]. . |
[5] | BARROSO L A , HLZLE U . The datacenter as a computer: an intro-duction to the design of warehouse-scale machines[R]. Morgan: Syn-thesis Lectures on Computer Architecture, Morgan & Cla l Pub-lishers, 2009. |
[6] | BORTHAKU D . The hadoop distributed file system: architecture and design[J]. Hadoop Project Website, 2007,11(1): 1-10. |
[7] | GHEMAWAT S , GOBIOFF H , LEUNG S T . The google gile sys-tem[C]// 19th ACM Symposium on Operating System Principles. New York, ACM, c2003:29-43. |
[8] | DEAN J , GHEMAWAT S . MapReduce: simplifed data processing on large clusters[C]// The Conference on Operating System Design and Implementation(OSDI). New York, ACM, c2004:137-150. |
[9] | 王鹏, 孟丹, 詹剑锋 , 等. 数据密集型计算编程模型研究进展[J]. 计算机研究与发展, 2010,47(11):1993-2002. WANG P , MENG D , ZHAN JF , et al. Review of programming models for data-Intensive computing[J]. Journal of Computer Research a Development, 2010,47(11):1993-2002. |
[10] | LI D , WANG J E . Energy efficient redundant and inexpensive disk array[C]// The ACM SIGOPS European Workshop. New York, ACM, c2004:29-35. |
[11] | 林闯, 田源, 姚敏 . 绿色网络和绿色评价: 节能机制、模型和评价[J]. 计算机学报, 2011,34(4):593-612. LIN C , TIAN Y , YAO M . Green network and green evaluation: Me-chanism, modeling and evaluation[J]. Chinese Journal f Computers, 2011,34(4):593-612. |
[12] | 廖彬, 于炯, 张陶 , 等. 基于分布式文件系统 HDFS 的节能算法[J]. 计算机学报, 2013,36(5):1047-1064. LIAO B , YU J , ZHANG T , et al. Energy-efficient algorithms for dis-tributed file system HDFS[J]. Chinese Journal of Computers, 2013,36(5):1047-1064. |
[13] | ALBERS S . Energy-efficient algorithms[J]. Communications of the ACM, 2010,53(5): 86-96. |
[14] | WIERMAN A , ANDREW L L , TANG A . Power-aware speed scaling in processor sharing systems[C]// The 28th Conference on Computer Communications(INFOCOM 2009), Piscataway, NJ, c2009:2007-2015. |
[15] | ANDREW L L , LIN M , WIERMAN A . Optimality, fairness, and robustness in speed scaling designs[C]// ACM International Confe-rence on Measurement and Modeling of International Computer Sys-tems(SIGMETRICS 2010), New York,ACM, c2010:37-48. |
[16] | NEUGEBAUER R , MCAULEY D . Energy is just another resource:energy accounting and energy pricing in the nemesis OS[C]// The 8th IEEE Workshop on Hot Topics in Operating Systems. Piscataway, NJ, c2001:59-64. |
[17] | FLINN J , SATYANARAYANAN M . Managing battery lifetime w h energy-aware adaptation[J]. ACM Transactions on Computer Systems(TOCS), 2004,22(2): 179-182. |
[18] | MEISNER D , GOLD B T , WENISCH T F . PowerNap: eliminatin server idle power[J]. ACM SIGPLAN Notices, 2009,44(3): 205-216. |
[19] | YE K , JIANG X , YE D , et al. Two optimization mechanisms to im-prove the isolation property of server consolidation i virtualized mul-ti-core server[C]// The 12th IEEE International Conference on High Performance Computing and Communications. Melbourne, Australia, c2010:281-288. |
[20] | CHOI J , GOVINDAN S , JEONG J , et al. Power consumption predic-tion and power-aware packing in consolidated environments[J]. IEEE Transactions on Computers, c2010,59(12): 1640-1654. |
[21] | YE K , JIANG X , HUANG D , et al. Live migration of multiple virtual machines with resource reservation in cloud computing environ-ments[C]// The 4th IEEE International Conference on Cloud Compu-ting. Washington,USA, c2011:267-274. |
[22] | LIAO X , JIN H , LIU H . Towards a green cluster through ynamic remapping of virtual machines[J]. Future Generation Co ter Sys-tems, 2012,28(2): 469-477. |
[23] | JANG J W , JEON M , KIM H S , et al. Energy reduction in consolidat-ed servers through memory-aware virtual machine scheduling[J]. IEEE Transactions on Computers, 2011,99(1): 552-564. |
[24] | WANG X , WANG Y . Coordinating power control and performance management for virtualized server cluster[J]. IEEE Tra ions on Parallel and Distributed Systems, 2011,22(2): 245-259. |
[25] | WANG Y , WNAG X , CHEN M , et al. Partic: power-aware response time control for virtualized web servers[J]. IEEE Transactions on Pa-rallel and Distributed Systems, 2011,22(2): 323-336. |
[26] | DASGUPTA G , SHARMA A , VERMA A , et al. Workload manage-ment for power efficiency in virtualized data-centers[J]. Communica-tions of the ACM, 2011,54(7): 131-141. |
[27] | SRIKANTAIAH S , KANSAL A , ZHAO F . Energy aware consolida-tion for cloud computing[J]. Cluster Computing, 2009,12(1): 1-15. |
[28] | GARG S K , YEO C S , ANANDASIVAM A , et al. Environ-ment-conscious scheduling of HPC applications on distributed cloud-oriented data centers[J]. Journal of Parallel and Distributed Computing, 2010,71(6): 732-749. |
[29] | KUSIC D , KEPHART J O , HANSON J E , et al. Power and perfor-mance management of virtualized computing environments ia loo-kahead control[J]. Cluster Computing, 2009,12(1): 1-15. |
[30] | SONG Y , WANG H , LI Y , et al. Multi-tiered on-demand resource. scheduling for VM-based data center[C]// Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing nd the Grid(CCGrid 2009). Piscataway, NJ, c2009:148-155. |
[31] | GMACH D , ROLIA J , CHERKASOVA L , et al. resource pool man-agement: Reactive versus proactive or let's be friends[J]. Computer Networks, 2009,53(17): 2905-2922. |
[32] | BUYYA R , BELOGLAZOV A , ABAWAJY J . Energy-efficient man-agement of data center resources for cloud computing: a vision, archi-tectural elements, and open challenges[J]. Eprint Arxiv, 2010,12(4): 6-17. |
[33] | KIM K H , BELOGLAZOV A , BUYYA R . Power-aware provisioning of cloud resources for real-time services[C]// The 7th International Work-shop on Middleware for Grids. New York, c2009:1-6. |
[34] | 王意洁, 孙伟东, 周松 , 等. 云计算环境下分布存储关键技术[J]. 软件学报, 2012,23(4):962-986. WANG Y J , SUN W D , ZHOU S , et al. Key technologies of distri-buted storage for cloud computing[J]. Journal of Software, 2012,23(4):962-986. |
[35] | GREENAN K H , LONG D D E , MILLER E L , et al. A spin-up saved is energy earned: achieving power-efficient, erasure-coded sto-rage[C]// The HotDep 2008. Berkeley: USENIX Association, c2008:4. |
[36] | WEDDLE C , OLDHAM M , QIAN J , et al. A gear-shifting pow-er-aware raid[J]. ACM Transactions on Storage, 2007,3(3): 1553-1569. |
[37] | LI D , WANG J . Conserving energy in conventional disk based RAID systems[C]// The 3rd Int Workshop on Storage Network Architecture and Parallel I/Os(SNAPI 2005). Piscataway, NJ, c2005:65-72. |
[38] | YAO X , WANG J . Rimac: a novel redundancy-based hierarchical cache architecture for energy efficient, high performa storage sys-tems[C]// The EuroSys. New York,ACM, c2006:249-262. |
[39] | PINHEIRO E , BIANCHINI R , DUBNICKI C . Exploiting redundancy to conserve energy in storage systems[C]// The SIGMetrics Perfor-mance 2006. New York,ACM, c2006:15-26. |
[40] | NARAYANAN D , DONNELLY A , ROWSTRON A . Write off-loading:practical power management for enterprise storage[J]. ACM Transac-tions on Storage(TOS), 2008,4(3): 253-267. |
[41] | STORER M , GREENAN K , MILLER E , et al. Replacing tape with energy efficient, reliable, disk-based archival storage[C]// The FAST 2008. New York,ACM, c2008:1-16. |
[42] | ZHU Q , CHEN Z , TAN L , et al. Hibernator: Helping disk arrays sleep through the winter[C]// The 20th ACM Symposium on Operating Sys-tems Principles(SOSP). New York,ACM, c2005:177-190. |
[43] | VASIC N , BARISITS M , SALZGEBER V . Making cluster applications energy-aware[C]// The ACDC 2009. New York,ACM, c2009:37-42. |
[44] | 廖彬, 于炯, 孙华 , 等. 基于存储结构重配置的分布式存储系统节能算法[J]. 计算机研究与发展, 2013,50(1):3-18. LIAO B , YU J , SUN H , et al. Energy-efficient algorithms for distri-buted storage system based on data storage structure reconfiguration[J]. Journal of Computer Research and Development, 2013,50(1):3-18. |
[45] | LIAO B , YU J , SUN H , et al. A QoS-aware dynamic data replica deletion strategy for distributed storage systems under cloud compu-ting environments[C]// Cloud and Green Computing (CGC), 2012 Second International Conference. IEEE. c2012:219-225. |
[46] | 廖彬, 于炯, 钱育蓉 , 等. 基于可用性度量的分布式文件系统节点失效恢复算法[J]. 计算机科学, 2013,40(1):144-149. LIAO B , YU J , QIAN Y R , et al. The node failure recovery algorithm for distributed file system based on measurement of data availabili-ty[J]. Computer Sicence, 2013,40(1):144-149. |
[47] | ZHU Q , DAVID F M , DEVARAJ C F , et al. Reducing energy con-sumption of disk storage using power-aware cache manage-ment[C]// The HPCA 2004. Piscataway,NJ, c2004:118-129. |
[48] | LEVERICH J , KOZYRAKIS C . On the energy (in)efficiency f ha-doop clusters[J]. ACM SIGOPS Operating Systems Review, 2010,44(1): 61-65. |
[49] | LANG W , PATEL J M . Energy management for mapreduce clusters[J]. Proceedings of the VLDB Endowment, 2010,3(1-2):129-139. |
[50] | CHEN Y , KEYS L , KATZ R H . Towards energy effcient mapre-duce[R]. Berkeley: EECS Department, University of California, 2009-10-9. |
[51] | WIRTZ T , GE R . Improving MapReduce energy efficiency for com-putation intensive workloads[C]// Green Computing Conference and Workshops (IGCC). IEEE, c2011:1-8. |
[52] | GOIRI í , LE K , NGUYEN T D , et al. GreenHadoop: leveraging green energy in data-processing frameworks[C]// The 7th ACM european con-ference on Computer Systems. ACM, c2012:57-70. |
[53] | CARDOSA M , SINGH A , PUCHA H , et al. Exploiting Spa-tio-Temporal Tradeoffs for Energy Efficient MapReduce in t Cloud[D]. Department of Computer Science and Engineering, Univer-sity of Minnesota, 2010. |
[54] | CHEN Y , GANAPATHI A , KATZ R H . To compress or not to com-press-compute vs. IO tradeoffs for mapreduce energy efficien-cy[C]// The First ACM SIGCOMM Workshop on Green Networking. New Delhi, India, c2010:23-28. |
[55] | 宋杰, 李甜甜, 朱志良 , 等. 云数据管理系统能耗基准测试与分析[J]. 计算机学报, 2013,36(7):1485-1499. SONG J , LI T T , ZHU Z L , et al. Benchmarking and analyzing the energy consumption of cloud data management system[J]. Chinese Journal of Computers, 2013,36(7):1485-1492. |
[56] | ANDREWS M , ANTA A F , ZHANG L , et al. Routing for energy minimization in the speed scaling model[C]// The 29th IEEE Confe-rence on Computer Communications(INFOCOM'10). San Diego, USA, c2010:1-9. |
[57] | BARROSO L A , HOLZLE U . The case for energy-proportional com-puting[J]. Computer, 2007,40(12):33-37. |
[58] | 林闯, 田源, 姚敏 . 绿色网络和绿色评价:节能机制、模型和评价[J]. 计算机学报, 2011,34(4):593-612. LIN C , TIAN Y , YAO M . Green network and green evaluation: Me-chanism, modeling and evaluation[J]. Chinese Journal of Computers, 2011,34(4):593-612. |
[1] | 鲁蔚锋, 李宁, 徐佳, 徐力杰, 徐建. 多接入边缘计算中相关性任务的联合调度算法[J]. 通信学报, 2023, 44(4): 87-98. |
[2] | 牛志升, 周盛, 孙宇璇. 面向“双碳”战略的绿色通信与网络:挑战与对策[J]. 通信学报, 2022, 43(2): 1-14. |
[3] | 唐琴琴,谢人超,刘旭,张亚生,何辞,李诚成,黄韬. 融合MEC的星地协同网络:架构、关键技术与挑战[J]. 通信学报, 2020, 41(4): 162-181. |
[4] | 李罡,吴志军. 基于多QoS约束条件的广域信息管理系统任务调度算法[J]. 通信学报, 2019, 40(7): 27-37. |
[5] | 高元照,李炳龙,陈性元. 基于MapReduce的HDFS数据窃取随机检测算法[J]. 通信学报, 2018, 39(10): 11-21. |
[6] | 俞艺涵,付钰,吴晓平. MapReduce框架下支持差分隐私保护的随机梯度下降算法[J]. 通信学报, 2018, 39(1): 70-77. |
[7] | 李洪成,吴晓平,陈燕. MapReduce框架下支持差分隐私保护的k-means聚类方法[J]. 通信学报, 2016, 37(2): 125-131. |
[8] | 王海艳,曹攀. 基于节点属性与正文内容的海量Web信息抽取方法[J]. 通信学报, 2016, 37(10): 9-17. |
[9] | 辛宇,杨静,谢志强. 面向分布式环境的信号驱动任务调度算法[J]. 通信学报, 2015, 36(7): 1-72. |
[10] | 李洪成,吴晓平,严 博. 面向MANET异常检测的分布式遗传k-means研究[J]. 通信学报, 2015, 36(11): 167-173. |
[11] | 胡超,彭军,于文涛. 基于PSO算法的医疗大数据任务调度策略[J]. 通信学报, 2014, 35(Z1): 65-71. |
[12] | 胡 超,彭 军,于文涛. 基于PSO算法的医疗大数据任务调度策略[J]. 通信学报, 2014, 35(Z1): 13-71. |
[13] | 王 娟,李 飞,张路桥. PSO应用于QoS偏好感知的云存储任务调度[J]. 通信学报, 2014, 35(3): 27-238. |
[14] | 王娟,李飞,张路桥. PSO应用于QoS偏好感知的云存储任务调度[J]. 通信学报, 2014, 35(3): 231-238. |
[15] | 沈晴霓,卿斯汉,吴中海,张力哲,杨雅辉. 基于动态域划分的MapReduce安全冗余调度策略[J]. 通信学报, 2014, 35(1): 34-46. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|