通信学报 ›› 2020, Vol. 41 ›› Issue (10): 92-108.doi: 10.11959/j.issn.1000-436x.2020195
李梓杨1,2,于炯1,2,王跃飞3,卞琛4,蒲勇霖2,张译天1,刘宇1
修回日期:
2020-07-20
出版日期:
2020-10-25
发布日期:
2020-11-05
作者简介:
李梓杨(1993- ),男,新疆乌鲁木齐人,新疆大学博士生,主要研究方向为分布式系统、内存计算、流式计算|于炯(1964- ),男,北京人,博士,新疆大学教授、博士生导师,主要研究方向为网格计算、并行计算、分布式系统|王跃飞(1991- ),男,新疆乌鲁木齐人,博士,成都大学讲师,主要研究方向为数据挖掘、机器学习|卞琛(1981- ),男,江苏南京人,博士,广东金融学院副教授,主要研究方向为分布式系统、内存计算、绿色计算|蒲勇霖(1991- ),男,山东淄博人,新疆大学博士生,主要研究方向为内存计算、流式计算、绿色计算|张译天(1995- ),男,河南商丘人,新疆大学硕士生,主要研究方向为云计算、实时计算、分布式计算|刘宇(1996- ),男,新疆克拉玛依人,新疆大学硕士生,主要研究方向为云计算、分布式计算
基金资助:
Ziyang LI1,2,Jiong YU1,2,Yuefei WANG3,Chen BIAN4,Yonglin PU2,Yitian ZHANG1,Yu LIU1
Revised:
2020-07-20
Online:
2020-10-25
Published:
2020-11-05
Supported by:
摘要:
为了解决大数据流式计算平台中存在计算负载剧烈波动,但集群因资源不足而遇到性能瓶颈的问题,提出了Flink环境下基于负载预测的弹性资源调度(LPERS-Flink)策略。首先,建立负载预测模型并在此基础上提出负载预测算法,预测集群负载的变化趋势;其次,建立资源判定模型,以判定集群出现资源瓶颈与资源过剩的问题,由此提出弹性资源调度算法,制定弹性资源调度计划;最后,通过在线负载迁移算法执行调度计划,实现高效的节点间负载迁移。实验结果表明,该策略在负载剧烈波动的应用场景中有较好的优化效果,实现了集群规模和资源配置对负载变化的及时响应,降低了负载迁移的通信开销。
中图分类号:
李梓杨,于炯,王跃飞,卞琛,蒲勇霖,张译天,刘宇. Flink环境下基于负载预测的弹性资源调度策略[J]. 通信学报, 2020, 41(10): 92-108.
Ziyang LI,Jiong YU,Yuefei WANG,Chen BIAN,Yonglin PU,Yitian ZHANG,Yu LIU. Load prediction based elastic resource scheduling strategy in Flink[J]. Journal on Communications, 2020, 41(10): 92-108.
表4
性能参数配置"
配置项 | 参数值 | 说明 |
JobManager.heap.size/MB | 2 048 | 主节点内存 |
TaskManager.heap.size/MB | 2 048 | 工作节点内存 |
TaskManager.numberOfTaskSlots | 2 | 节点线程数目 |
high-availability | Zookeeper | 开启HA模式 |
state.backend | rocksdb | 状态数据存储 |
state.backend.incremental | true | 增量式快照 |
TaskManager.network.memory.fraction | 0.2 | 缓冲区大小 |
TaskManager.network.memory.max/MB | 500 | 缓冲区上限 |
TaskManager.memory.segment-size | 32 768 | 内存分块大小 |
表7
对比实验结果"
系统名称 | 调度策略 | 优点 | 缺点 | 适用场景 |
原系统 | 默认调度策略 | 支持Exactly-Once的有状态数据流处理 | 无弹性资源调度策略 | 计算负载稳定或小幅度波动 |
EN | 通过数学模型计算每个算子合理的并行度,并动态增加计算资源 | 数据迁移过程中可同时执行计算任务 | 数据迁移过程的时间开销较高 | 负载持续上升,上升幅度较大,且状态数据规模不大 |
FAR-Flink | 先合理分配上升的计算负载,再通过流网络模型检测需要增加并行度的算子,并动态增加计算资源 | 准确分配计算资源,有效降低数据迁移的时间开销 | 数据迁移时任务有极短暂的停滞(约2~3 s) | 负载持续上升,上升幅度较大,且状态数据规模较大 |
LPERS-Flink | 根据负载预测结果提前执行弹性资源调度,并在线迁移计算任务和状态数据 | 提前响应计算负载的波动变化,避免调度滞后的问题,调度过程不影响集群性能 | 节点的资源利用率出现轻微波动 | 计算负载剧烈波动,且对计算的实时性要求高 |
[32] | RUSSO G R , . Self-adaptive data stream processing in geo-distributed computing environments[C]// Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems. New York:ACM Press, 2019: 276-279. |
[33] | MENCAGLI G , TORQUATI M , DANELUTTO M . Elastic-PPQ:a two-level autonomic system for spatial preference query processing over dynamic data streams[J]. Future Generation Computer Systems, 2018,79: 862-877. |
[34] | HIDALGO N , WLADDIMIRO D , ROSAS E . Self-adaptive processing graph with operator fission for elastic stream processing[J]. Journal of Systems and Software, 2017,127: 205-216. |
[35] | 李梓杨, 于炯, 卞琛 ,等. 基于流网络的Flink平台弹性资源调度策略[J]. 通信学报, 2019,40(8): 85-101. |
[1] | 彭安妮, 周威, 贾岩 ,等. 物联网操作系统安全研究综述[J]. 通信学报, 2018,39(3): 22-34. |
PENG A N , ZHOU W , JIA Y ,et al. Survey of the Internet of things operating system security[J]. Journal on Communications, 2018,39(3): 22-34. | |
[35] | LI Z Y , YU J , BIAN C ,et al. Flow-network based auto rescale strategy for Flink[J]. Journal on Communications, 2019,40(8): 85-101. |
[36] | LOHRMANN B , JANACIK P , KAO O . Elastic stream processing with latency guarantees[C]// 2015 IEEE 35th International Conference on Distributed Computing Systems. Piscataway:IEEE Press, 2015: 399-410. |
[2] | DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1): 107-113. |
[3] | 卞琛, 于炯, 修位蓉 ,等. 基于分配适应度的 Spark 渐进填充分区映射算法[J]. 通信学报, 2017,38(9): 133-147. |
BIAN C , YU J , XIU W R ,et al. Progressive filling partitioning and mapping algorithm for Spark based on allocation fitness degree[J]. Journal on Communications, 2017,38(9): 133-147. | |
[4] | 卞琛, 于炯, 修位蓉 ,等. 内存计算框架局部数据优先拉取策略[J]. 计算机研究与发展, 2017,54(4): 787-803. |
BIAN C , YU J , XIU W R ,et al. Partial data shuffled first strategy for in-memory computing framework[J]. Journal of Computer Research and Development, 2017,54(4): 787-803. | |
[5] | 孙大为, 张广艳, 郑纬民 . 大数据流式计算:关键技术及系统实例[J]. 软件学报, 2014,25(4): 839-862. |
SUN D W , ZHANG G Y , ZHENG W M . Big data stream computing:technologies and instances[J]. Journal of Software, 2014,25(4): 839-862. | |
[6] | ALEXANDROVE A , BERGMANN R , EWEN S ,et al. The stratosphere platform for big data analytics[J]. The VLDB Journal, 2014,23(6): 939-964. |
[37] | SUN D , GAO S , LIU X ,et al. State and runtime-aware scheduling in elastic stream computing systems[J]. Future Generation Computer Systems, 2019,97: 194-209. |
[38] | 李梓杨, 于炯, 卞琛 ,等. 基于流网络的流式计算动态任务调度策略[J]. 计算机应用, 2018,38(9): 2560-2567. |
[7] | CARBONE P , KATSIFODIMOS A , EWEN S ,et al. Apache Flink:stream and batch processing in a single engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015,36(4): 28-38. |
[8] | TOSHNIWAL A , TANEJA S , SHUKLA A ,et al. Storm @Twitter[C]// The 2014 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2014: 147-156. |
[9] | CARBONE P , EWEN S , FóRA G ,et al. State management in Apache Flink?:consistent stateful distributed stream processing[J]. Proceedings of the VLDB Endowment, 2017,10(12): 1718-1729. |
[10] | PARIS C , GYULA F , STEPHAN E ,et al. Lightweight asynchronous snapshots for distributed dataflows[J]. Computer Science,arXiv Preprint,arXiv:1506.08603, 2015 |
[11] | KULKARNI S , BHAGAT N , FU M ,et al. Twitter Heron:stream processing at scale[C]// Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2015: 239-250. |
[12] | FLORATOU A , AGRAWAL A . Self-regulating streaming systems:challenges and opportunities[C]// Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics. New York:ACM Press, 2017: 1-5. |
[13] | SUN D , ZHANG G , YANG S ,et al. Re-stream:real-time and energy-efficient resource scheduling in big data stream computing environments[J]. Information Sciences, 2015,319: 92-112. |
[14] | 蒲勇霖, 于炯, 鲁亮 ,等. Storm平台下工作节点的内存电压调控节能策略[J]. 通信学报, 2018,39(10): 97-117. |
PU Y L , YU J , LU L ,et al. Energy-efficient strategy for work node by DRAM voltage regulation in Storm[J]. Journal on Communications, 2018,39(10): 97-117. | |
[15] | SUN D , FU G , LIU X ,et al. Optimizing data stream graph for big data stream computing in cloud datacenter environments[J]. International Journal of Advancements in Computing Technology, 2014,6(5):53. |
[16] | 蒲勇霖, 于炯, 鲁亮 ,等. 基于 Storm 平台的数据迁移合并节能策略[J]. 通信学报, 2019,40(12): 68-85. |
PU Y L , YU J , LU L ,et al. Energy-efficient strategy for data migration and merging in Storm[J]. Journal on Communications, 2019,40(12): 68-85. | |
[17] | ZHANG C , CHEN X , LI Z ,et al. An on-the-fly scheduling strategy for distributed stream processing platform[C]// IEEE International Conference on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications. Piscataway:IEEE Press, 2018: 773-780. |
[18] | SHUKLA A , SIMMHAN Y . Model-driven scheduling for distributed stream processing systems[J]. Journal of Parallel and Distributed Computing, 2018,117: 98-114. |
[19] | CARDELLINI V , MENCAGLI G , TALIA D ,et al. New landscapes of the data stream processing in the era of fog computing[J]. Future Generation Computer Systems, 2019,99: 646-650. |
[20] | TANTALAKI N , SOURAVLAS S , ROUMELIOTIS M ,et al. Linear scheduling of big data streams on multiprocessor sets in the cloud[C]// IEEE/WIC/ACM International Conference on Web Intelligence. New York:ACM Press, 2019: 107-115. |
[21] | ESKANDARI L , MAIR J , HUANG Z ,et al. T3-Scheduler:a topology and traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster[J]. Future Generation Computer Systems, 2018,89: 617-632. |
[22] | SILVA V A , DE-SOUZA F R , DE-ASSUN??O M D ,et al. Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing[C]// Proceedings of the 48th International Conference on Parallel Processing. New York:ACM Press, 2019:106. |
[23] | LOUKOPOULOS T , TZIRITAS N , KOZIRI M ,et al. A pareto-efficient algorithm for data stream processing at network edges[C]// 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). Piscataway:IEEE Press, 2018: 159-162. |
[24] | PAGLIARI A , HUET F , URVOY-KELLER G . On the cost of acking in data stream processing systems[C]// 19th IEEE/ACM International Symposium on Cluster,Cloud,and Grid Computing. Piscataway:IEEE Press, 2019: 14-17. |
[25] | ZHOU S , ZHANG F , CHEN H ,et al. Fastjoin:a skewness-aware distributed stream join system[C]// 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Piscataway:IEEE Press, 2019: 1042-1052. |
[26] | R?GER H , MAYER R . A comprehensive survey on parallelization and elasticity in stream processing[J]. ACM Computing Surveys (CSUR), 2019,52(2):36. |
[27] | LIU S , WENG J , WANG J H ,et al. An adaptive online scheme for scheduling and resource enforcement in Storm[J]. IEEE/ACM Transactions on Networking, 2019,27(4): 1373-1386. |
[28] | RUSSO G R , CARDELLINI V , PRESTI F L . Reinforcement learning based policies for elastic stream processing on heterogeneous resources[C]// Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems. New York:ACM Press, 2019: 31-42. |
[29] | RUSSO G , NARDELLI M , CARDELLINI V ,et al. Multi-level elasticity for wide-area data streaming systems:a reinforcement learning approach[J]. Algorithms, 2018,11(9):134. |
[30] | CARDELLINI V , PRESTI F L , NARDELLI M ,et al. Towards hierarchical autonomous control for elastic data stream processing in the fog[C]// European Conference on Parallel Processing. Berlin:Springer, 2017: 106-117. |
[38] | LI Z Y , YU J , BIAN C ,et al. Dynamic task dispatching strategy for stream processing based on flow network[J]. Journal of Computer Application, 2018,38(9): 2560-2567. |
[31] | MEHDI B , CéDRIC T , . A fully decentralized autoscaling algorithm for stream processing applications[C]// Auto-DaSP 2019-Third International Workshop on Autonomic Solutions for Parallel and Distributed Data Stream Processing. Berlin:Springer, 2019: 1-12. |
[1] | 王莉, 费爱国, 张平, 徐连明. 智能应急指挥通信网络新框架与关键技术研究[J]. 通信学报, 2023, 44(6): 1-11. |
[2] | 赵庶旭, 韦萍, 王小龙. 多任务并发边缘计算环境中最优联盟结构生成策略[J]. 通信学报, 2023, 44(2): 172-184. |
[3] | 何元智, 彭聪, 于季弘, 刘韵. 面向密集多波束组网的卫星通信系统资源调度算法[J]. 通信学报, 2021, 42(4): 109-118. |
[4] | 赖英旭,蒲叶玮,刘静. 基于最小代价路径的交换机迁移方法研究[J]. 通信学报, 2020, 41(2): 131-142. |
[5] | 李梓杨,于炯,卞琛,张译天,蒲勇霖,王跃飞,鲁亮. 基于流网络的Flink平台弹性资源调度策略[J]. 通信学报, 2019, 40(8): 85-101. |
[6] | 苏命峰,王国军,李仁发. 基于利益相关视角的多维QoS云资源调度方法[J]. 通信学报, 2019, 40(6): 102-115. |
[7] | 蒲勇霖,于炯,鲁亮,卞琛,廖彬,李梓杨. storm平台下工作节点的内存电压调控节能策略[J]. 通信学报, 2018, 39(10): 97-117. |
[8] | 王睿,韩笑冬,王超,周晞,龙军. 天基信息网络资源调度与协同管理[J]. 通信学报, 2017, 38(Z1): 104-109. |
[9] | 郭 平,宁立江,陈海珠. 满足本地化计算的集群资源调度策略[J]. 通信学报, 2014, 35(Z2): 1-8. |
[10] | 郭平,宁立江,陈海珠. 满足本地化计算的集群资源调度策略[J]. 通信学报, 2014, 35(Z2): 1-8. |
[11] | 夏纯中1,2,宋顺林1. 基于商空间的层次式数据网格资源调度算法[J]. 通信学报, 2013, 34(6): 18-155. |
[12] | 夏纯中,宋顺林. 基于商空间的层次式数据网格资源调度算法[J]. 通信学报, 2013, 34(6): 146-155. |
[13] | 覃光成,尹浩,陈强,吴泽民,杨盘隆. 面向价值的战场信息处理与分发优化算法[J]. 通信学报, 2011, 32(3): 60-68. |
[14] | 宋莉,胡立栓,肖沣,项彩虹. 基于OGSA的计算资源调度的一种实现[J]. 通信学报, 2005, 26(1A): 163-166. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|