电信科学 ›› 2023, Vol. 39 ›› Issue (6): 1-21.doi: 10.11959/j.issn.1000-0801.2023125
• 综述 • 下一篇
高凯辉, 李丹
修回日期:
2023-06-06
出版日期:
2023-06-20
发布日期:
2023-06-01
作者简介:
高凯辉(1996- ),男,清华大学计算机科学与技术系博士生,主要研究方向为数据中心网络和网络智能基金资助:
Kaihui GAO, Dan LI
Revised:
2023-06-06
Online:
2023-06-20
Published:
2023-06-01
Supported by:
摘要:
数据中心网络(DCN)作为重要的信息基础设施支撑了众多分布式应用,如人工智能训练和云存储等。这些应用通过网络传输大量数据,因此对数据中心网络性能的稳定性提出了很高的要求。近年来,数据中心网络性能保障研究受到学术界和工业界的广泛关注。首先,分析了数据中心网络实现性能保障面临的主要挑战,并提出了提升性能稳定性的研究思路。其次,总结了性能有保障的数据中心网络必备的三大属性——高可用性、带宽保证和有界延迟,系统性地综述了这 3 个方面的相关研究工作,并对这些研究工作从多个角度进行了对比分析。最后,对数据中心网络性能保障研究的未来发展趋势进行了展望。
中图分类号:
高凯辉, 李丹. 数据中心网络性能保障研究综述[J]. 电信科学, 2023, 39(6): 1-21.
Kaihui GAO, Dan LI. Data center networks with performance guarantee: a survey[J]. Telecommunications Science, 2023, 39(6): 1-21.
表1
故障检测和重路由相关研究工作分类"
领域类型 | 代表工作 | 响应时间 | 开销 | 网络要求 |
故障检测主机侧 | Pingmesh[ | 毫秒~秒级 | 高带宽开销 | — |
NetPoirot[ | 秒级 | 高CPU开销 | — | |
Trumpet[ | 秒级 | 高CPU开销 | FPGA网卡 | |
网络侧 | BasisDetect[ | 分钟级 | — | 采集全网数据 |
F10[ | 毫秒 | 高带宽开销 | 可编程交换机 | |
Everflow[ | 秒级 | 高带宽、CPU开销 | 可编程交换机 | |
重路由 拓扑工程 | F10[ | 毫秒 | — | 更改拓扑 |
网络内重路由 | IPFRR[ | 毫秒 | — | — |
DDC[ | 毫秒 | — | 可编程交换机 | |
源路由 | SR-FCP [ | 毫秒 | 带宽开销 | 支持源路由 |
表2
数据中心网络带宽保证相关工作分类"
类型 | 代表工作 | 租户级带宽保证 | 资源高效利用 | 网络要求 |
带宽预留 | SecondNet[ | √ | × | 商用交换机 |
Proteus[ | √ | √ | 商用交换机 | |
Hadrian[ | √ | √ | 定制交换机 | |
队列隔离 | FairCloud (PS-P)[ | √ | √ | 逐租户队列 |
Trinity | √ | √ | 多优先级队列 | |
加权拥塞控制 | Seawall[ | × | √ | ECN功能 |
ElasticSwitch[ | √ | √ | 负载均衡网络 | |
端侧准入控制 | Gatekeeper[ | √ | 部分 | 无拥塞网络 |
端网协同 | μFab[ | √ | √ | 可编程网络 |
表3
数据中心网络延迟优化相关工作分类"
类型 | 代表工作 | 低延迟 | 有界延迟 | 网络要求 |
拥塞控制 | DCTCP[ | √ | × | ECN功能 |
TIMELY[ | √ | × | RTT精确测量 | |
HPCC[ | √ | × | INT功能 | |
μFab[ | √ | √ | 可编程网络 | |
集中式调度 | Fastpass[ | × | √ | 源路由 |
Chameleon[ | × | √ | 可编程交换机 | |
截止时间调度 | D3[ | × | 部分 | 定制交换机 |
优先级调度 | pFabric[ | 部分 | × | 流信息已知 |
PIAS[ | 部分 | × | 流信息不感知 | |
PASE [ | √ | × | 商用交换机 |
[1] | 李丹, 陈贵海, 任丰原 ,等. 数据中心网络的研究进展与趋势[J]. 计算机学报, 2014,37(2): 259-274. |
LI D , CHEN G H , REN F Y ,et al. Data center network research progress and trends[J]. Chinese Journal of Computers, 2014,37(2): 259-274. | |
[2] | YOUNG J , BARTH T . Akamai online retail performance report:milliseconds are critical[R]. 2017. |
[3] | WANG S , LI D , ZHANG J S ,et al. CEFS:compute-efficient flow scheduling for iterative synchronous applications[C]// Proceedings of the 16th International Conference on Emerging Networking Experiments and Technologies. New York:ACM Press, 2020: 136-148. |
[4] | CNCF. Cloud native computing foundation[EB]. 2021. |
[5] | GOUK D , LEE S , KWON M ,et al. Direct access,high- performance memory disaggregation with Direct CXL[C]// 2022 USENIX Annual Technical Conference. Berkeley:USENIX Association, 2022: 287-294. |
[6] | ZHANG X C , WANG T Y . Elastic and reliable bandwidth reservation based on distributed traffic monitoring and control[J]. IEEE Transactions on Parallel and Distributed Systems, 2022,33(12): 4563-4580. |
[7] | XIA W F , ZHAO P , WEN Y G ,et al. A survey on data center networking (DCN):infrastructure and operations[J]. IEEE Communications Surveys & Tutorials, 2017,19(1): 640-656. |
[8] | 曾高雄, 胡水海, 张骏雪 ,等. 数据中心网络传输协议综述[J]. 计算机研究与发展, 2020,57(1): 74-84. |
ZENG G X , HU S H , ZHANG J X ,et al. Transport protocols for data center networks:a survey[J]. Journal of Computer Research and Development, 2020,57(1): 74-84. | |
[9] | 蒋炜, 索龙, 晋路遥 ,等. 数据中心虚拟网络映射综述[J]. 电力信息与通信技术, 2021,19(4): 9-17. |
JIANG W , SUO L , JIN L Y ,et al. Overview of virtual network embedding in data centers[J]. Electric Power Information and Communication Technology, 2021,19(4): 9-17. | |
[10] | 武晋, 何利力 . 云计算数据中心能耗优化研究综述[J]. 软件导刊, 2019,18(8): 4-7. |
WU J , HE L L . A summary of research on energy optimization of cloud computing data center[J]. Software Guide, 2019,18(8): 4-7. | |
[11] | KUMAR P . Toward predictable networks[D]. Ithaca:Cornell University, 2021. |
[12] | JANARDHAN S . Update about the October 4th outage[EB]. 2021. |
[13] | 李文信, 齐恒, 徐仁海 ,等. 数据中心网络流量调度的研究进展与趋势[J]. 计算机学报, 2020,43(4): 600-617. |
LI W X , QI H , XU R H ,et al. Data center network flow scheduling progress and trends[J]. Chinese Journal of Computers, 2020,43(4): 600-617. | |
[14] | CLARK D . The design philosophy of the DARPA Internet protocols[C]// Proceedings of Symposium Proceedings on Communications Architectures and Protocols - SIGCOMM’88. New York:ACM Press, 1988: 106-114. |
[15] | ARYAL A , LIAO Y Y , NATTUTHURAI P ,et al. The emerging big data analytics and IoT in supply chain management:a systematic review[J]. Supply Chain Management-An International Journal, 2020,25(2): 141-156. |
[16] | SINGH A , ONG J , AGARWAL A ,et al. Jupiter rising:a decade of clos topologies and centralized control in Google’s datacenter network[J]. ACM SIGCOMM Computer Communication Review, 2015,45(4): 183-197. |
[17] | AMODEI D , HERNANDEZON D . AI and compute[EB]. 2018. |
[18] | DOBRESCU M , ARGYRAKI K , RATNASAMY S . Toward predictable performance in software packet-processing platforms[J]. Proceedings of NSDI 2012:9th USENIX Symposium on Networked Systems Design and Implementation.Berkeley:USENIX Association, 2012: 141-154. |
[19] | MOORE G E . Cramming more components onto integrated circuits[J]. Proceedings of the IEEE, 1998,86(1): 82-85. |
[20] | ZHAO S Z , CAO P R , WANG X B . Understanding the performance guarantee of physical topology design for optical circuit switched data centers[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2021,5(3): 1-24. |
[21] | MCKEOWN N , ANDERSON T , BALAKRISHNAN H ,et al. OpenFlow:enabling innovation in campus networks[J]. ACM SIGCOMM Computer Communication Review, 2008,38(2): 69-74. |
[22] | CARIA M , JUKAN A , HOFFMANN M . SDN partitioning:a centralized control plane for distributed routing protocols[J]. IEEE Transactions on Network and Service Management, 2016,13(3): 381-393. |
[23] | FOSTER N , MCKEOWN N , REXFORD J ,et al. Using deep programmability to put network owners in control[J]. ACM SIGCOMM Computer Communication Review, 2020,50(4): 82-88. |
[24] | SHIRMARZ A , GHAFFARI A . Performance issues and solutions in SDN-based data center:a survey[J]. The Journal of Supercomputing, 2020,76(10): 7545-7593. |
[25] | GUO C X , YUAN L H , XIANG D ,et al. Pingmesh:a large-scale system for data center network latency measurement and analysis[J]. ACM SIGCOMM Computer Communication Review, 2015,45(4): 139-152. |
[26] | PENG Y H , YANG J , WU C ,et al. Detector:a topology-aware monitoring system for data center networks[C]// Proceedings of the 2017 USENIX Conference on USENIX Annual Technical Conference. Berkeley:USENIX Association, 2017: 55-68. |
[27] | TAN C , JIN Z , GUO C X ,et al. NetBouncer:active device and link failure localization in data center networks[C]// Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2019: 599-614. |
[28] | ARZANI B , CIRACI S , LOO B T ,et al. Taking the blame game out of data centers operations with NetPoirot[C]// Proceedings of the 2016 ACM SIGCOMM Conference. New York:ACM Press, 2016: 440-453. |
[29] | ROY A , ZENG H Y , BAGGA J ,et al. Passive realtime datacenter fault detection and localization[C]// Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2017: 595-612. |
[30] | ARZANI B , CIRACI S , CHAMON L ,et al. 007:democratically finding the cause of packet drops[C]// Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2018: 419-435. |
[31] | MOSHREF M , YU M L , GOVINDAN R ,et al. Trumpet:timely and precise triggers in data centers[C]// Proceedings of the 2016 ACM SIGCOMM Conference. New York:ACM Press, 2016: 129-143. |
[32] | GENG Y , LIU S , YIN Z ,et al. SIMON:a simple and scalable method for sensing,inference and measurement in data center networks[C]// Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2019: 549-564. |
[33] | ERIKSSON B , BARFORD P , BOWDEN R ,et al. BasisDetect:a model-based network event detection framework[C]// Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. New York:ACM Press, 2010: 451-464. |
[34] | LIU V , HALPERIN D , KRISHNAMURTHY A ,et al. F10:a fault-tolerant engineered network[C]// Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2013: 399-412. |
[35] | ZHU Y B , KANG N X , CAO J X ,et al. Packet-level telemetry in large datacenter networks[C]// Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. New York:ACM Press, 2015: 479-491. |
[36] | LI Y L , MIAO R , KIM C ,et al. LossRadar:fast detection of lost packets in data center networks[C]// Proceedings of the 12th International on Conference on Emerging Networking Experiments and Technologies. New York:ACM Press, 2016: 481-495. |
[37] | ZHOU Y , SUN C , LIU H H ,et al. Flow event telemetry on programmable data plane[C]// Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications,Technologies,Architectures,and Protocols for Computer Communication. New York:ACM Press, 2020: 76-89. |
[38] | MOLERO E C , VISSICCHIO S , VANBEVER L . FAst in-network GraY failure detection for ISPs[C]// Proceedings of the ACM SIGCOMM 2022 Conference. New York:ACM Press, 2022: 677-692. |
[39] | WU D M , XIA Y T , SUN X S ,et al. Masking failures from application performance in data center networks with shareable backup[C]// Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2018: 176-190. |
[40] | PAPáN J , SEGE? P , MORAV?íK M ,et al. Overview of IP fast reroute solutions[C]// Proceedings of 2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA). Piscataway:IEEE Press, 2018: 417-424. |
[41] | LEMESHKO O , YEVDOKYMENKO M , YEREMENKO O ,et al. Design of the fast ReRoute QoS protection scheme for bandwidth and probability of packet loss in software-defined WAN[C]// Proceedings of 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). Piscataway:IEEE Press, 2019: 1-5. |
[42] | KAMISI?SKI A . Evolution of IP fast-reroute strategies[C]// Proceedings of 2018 10th International Workshop on Resilient Networks Design and Modeling (RNDM). Piscataway:IEEE Press, 2018: 1-6. |
[43] | LEMESHKO O , YEREMENKO O , YEVDOKYMENKO M . MPLS traffic engineering solution of multipath fast ReRoute with local and bandwidth protection[C]// Proceedings of 2020 International Conference on Computer Science,Engineering and Education Applications. Cham:Springer, 2020: 113-125. |
[44] | ADRICHEM N L M , ASTEN B J , KUIPERS F A . Fast recovery in software-defined networks[C]// Proceedings of 2014 Third European Workshop on Software Defined Networks. Piscataway:IEEE Press, 2014: 61-66. |
[45] | KU?NIAR M , PERE?íNI P , VASI? N ,et al. Automatic failure recovery for software-defined networks[C]// Proceedings of the 2nd ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking. New York:ACM Press, 2013: 159-160. |
[46] | Cisco. 2014.BGP PIC edge for IP and MPLS-VPN[EB]. 2014. |
[47] | LIU J , PANDA A , SINGLA A ,et al. Ensuring connectivity via data plane mechanisms[C]// Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2013: 113-126. |
[48] | AL-FARES M , LOUKISSAS A , VAHDAT A . A scalable,commodity data center network architecture[C]// Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication. New York:ACM Press, 2008: 63-74. |
[49] | CHIESA M , SEDAR R , ANTICHI G ,et al. PURR:a primitive for reconfigurable fast reroute:hope for the best and program for the worst[C]// Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies. New York:ACM Press, 2019: 1-14. |
[50] | LAKSHMINARAYANAN K , CAESAR M , RANGAN M ,et al. Achieving convergence-free routing using failure-carrying packets[J]. ACM SIGCOMM Computer Communication Review, 2007,37(4): 241-252. |
[51] | RAMOS R M , MARTINELLO M , ROTHENBERG C E . Slickflow:resilient source routing in data center networks unlocked by OpenFlow[C]// Proceedings of 38th Annual IEEE Conference on Local Computer Networks. Piscataway:IEEE Press, 2014: 606-613. |
[52] | GUO C X , LU G H , WANG H J ,et al. SecondNet:a data center network virtualization architecture with bandwidth guarantees[C]// Proceedings of the 6th International Conference on Emerging Networking Experiments and Technologies. New York:ACM Press, 2010. |
[53] | PRESUHN R . Management information base (MIB) for the simple network management protocol (SNMP)[J]. RFC, 2002,3418: 1-26. |
[54] | LONVICK C . The BSD syslog protocol[R]. 2001. |
[55] | KATZ D , WARD D . Bidirectional forwarding detection (BFD)[R]. 2010. |
[56] | HUANG P , GUO C X , ZHOU L D ,et al. Gray failure:the achilles’ heel of cloud-scale systems[C]// Proceedings of the 16th Workshop on Hot Topics in Operating Systems. New York:ACM Press, 2017: 150-155. |
[57] | ZHUO D Y , GHOBADI M , MAHAJAN R ,et al. Understanding and mitigating packet corruption in data center networks[C]// Proceedings of the Conference of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2017: 362-375. |
[58] | ZHU Y B , ERAN H , FIRESTONE D ,et al. Congestion control for large-scale RDMA deployments[J]. ACM SIGCOMM Computer Communication Review, 2015,45(4): 523-536. |
[59] | CARDWELL N , CHENG Y , GUNN C S ,et al. Bbr[J]. Communications of the ACM, 2017,60(2): 58-66. |
[60] | BOSSHART P , DALY D , IZZARD M ,et al. Programming protocol-independent packet processors[J]. ACM SIGCOMM Computer Communication Review, 2014,44(3): 87-95. |
[61] | MOY J . OSPF version 2[R]. 1997. |
[62] | ISO. Intermediate system-to-intermediate system (IS-IS) routing protocol[R]. 2002. |
[63] | ALIZADEH M , GREENBERG A , MALTZ D ,et al. Data center TCP (DCTCP)[C]// Proceedings of the ACM SIGCOMM 2010 Conference. New York:ACM Press, 2010: 63-74. |
[64] | BASIT Z , TABASSUM M , SHARMA T ,et al. Performance analysis of OSPF and EIGRP convergence through IP sec tunnel using multi-homing BGP connection[J]. Materials Today:Proceedings, 2022(62): 4853-4861. |
[65] | GILL P , JAIN N , NAGAPPAN N . Understanding network failures in data centers[J]. ACM SIGCOMM Computer Communication Review, 2011,41(4): 350-361. |
[66] | CHEN Y R , REZAPOUR A , TZENG W G ,et al. RL-routing:an SDN routing algorithm based on deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2020,7(4): 3185-3199. |
[67] | Open Flow switch specification 1.3.1[EB]. 2013. |
[68] | NIRANJA N MYSORE R , PAMBORIS A , FARRINGTON N ,et al. Port Land:a scalable fault-tolerant layer 2 data center network fabric[J]. ACM SIGCOMM Computer Communication Review, 2009,39(4): 39-50. |
[69] | GAFNI E , BERTSEKAS D . Distributed algorithms for generating loop-free routes in networks with frequently changing topology[J]. IEEE Transactions on Communications, 1981,29(1): 11-18. |
[70] | BALLANI H , COSTA P , KARAGIANNIS T ,et al. Towards predictable datacenter networks[C]// Proceedings of the ACM SIGCOMM 2011 Conference. New York:ACM Press, 2011: 242-253. |
[71] | LEE J , TURNER Y , LEE M ,et al. Application-driven bandwidth guarantees in datacenters[C]// Proceedings of the 2014 ACM Conference on SIGCOMM. New York:ACM Press, 2014: 467-478. |
[72] | JANG K , SHERRY J , BALLANI H ,et al. Silo:predictable message latency in the cloud[C]// Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. New York:ACM Press, 2015: 435-448. |
[73] | XIE D , DING N , HU Y C ,et al. The only constant is change:incorporating time-varying network reservations in data centers[J]. ACM SIGCOMM Computer Communication Review, 2012,42(4): 199-210. |
[74] | CHOWDHURY M , LIU Z H , GHODSI A ,et al. HUG:multi-resource fairness for correlated and elastic demands[C]// Proceedings of the 13th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2016: 407-424. |
[75] | BALLANI H , JANG K , KARAGIANNIS T ,et al. Chatty tenants and the cloud network sharing problem[C]// Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2013: 171-184. |
[76] | POPA L , KUMAR G , CHOWDHURY M ,et al. FairCloud:sharing the network in cloud computing[C]// Proceedings of the 10th ACM Workshop on Hot Topics in Networks. New York:ACM Press, 2011: 1-6. |
[77] | LAM V T , RADHAKRISHNAN S , PAN R ,et al. NetShare and stochastic NetShare:predictable bandwidth allocation for data centers[J]. ACM SIGCOMM Computer Communication Review, 2012,42(3): 6-11. |
[78] | HU S H , BAI W , CHEN K ,et al. Providing bandwidth guarantees,work conservation and low latency simultaneously in the cloud[C]// Proceedings of IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. Piscataway:IEEE Press, 2016: 1-9. |
[79] | SHIEH A , KANDULA S , GREENBERG A ,et al. Sharing the data center network[C]// Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2011: 309-322. |
[80] | POPA L , YALAGANDULA P , BANERJEE S ,et al. ElasticSwitch:practical work-conserving bandwidth guarantees for cloud computing[C]// Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. New York:ACM Press, 2013: 351-362. |
[81] | RODRIGUES H , SANTOS J R , TURNER Y ,et al. Gatekeeper:supporting bandwidth guarantees for multi-tenant datacenter networks[C]// Proceedings of the 3rd Conference on I/O Virtualization. Berkeley:USENIX Association, 2011. |
[82] | JEYAKUMAR V , ALIZADEH M , MAZIèRES D ,et al. EyeQ:practical network performance isolation at the edge[C]// Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2013: 297-312. |
[83] | ANGEL S , BALLANI H , KARAGIANNIS T ,et al. End-to-end performance isolation through virtual datacenters[C]// Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. Berkeley:USENIX Association, 2014: 233-248. |
[84] | KUMAR P , DUKKIPATI N , LEWIS N ,et al. PicNIC:predictable virtualized NIC[C]// Proceedings of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2019: 351-366. |
[85] | ZHU J , LI D , WU J P ,et al. Towards bandwidth guarantee in multi-tenancy cloud computing networks[C]// Proceedings of 2012 20th IEEE International Conference on Network Protocols (ICNP). Piscataway:IEEE Press, 2013: 1-10. |
[86] | DUFFIELD N G , GOYAL P , GREENBERG A ,et al. A flexible model for resource management in virtual private networks[J]. ACM SIGCOMM Computer Communication Review, 1999,29(4): 95-108. |
[87] | BOUDEC J Y , THIRAN P . Network calculus a theory of deterministic queuing systems for the Internet[M]. Heidelberg: Springer, 2004. |
[88] | ALIZADEH M , YANG S , SHARIF M ,et al. pFabric:minimal near-optimal datacenter transport[C]// Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. New York:ACM Press, 2013: 435-446. |
[89] | KIM C , SIVARAMAN A , KATTA N ,et al. In-band network telemetry via programmable dataplanes[C]// Proceedings of the 2015 ACM Conference on SIGCOMM. New York:ACM Press, 2015. |
[90] | LI Y L , MIAO R , LIU H Q ,et al. HPCC:high precision congestion control[C]// Proceedings of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2019: 44-58. |
[91] | KELLY F P , RAINA G , VOICE T . Stability and fairness of explicit congestion control with small buffers[J]. ACM SIGCOMM Computer Communication Review, 2008,38(3): 51-62. |
[92] | WANG S , GAO K H , QIAN K ,et al. Predictable vFabric on informative data plane[C]// Proceedings of the ACM SIGCOMM 2022 Conference. New York:ACM Press, 2022: 615-632. |
[93] | MITTAL R , LAM V T , DUKKIPATI N ,et al. TIMELY:RTT-based congestion control for the datacenter[C]// Proceedings of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2015: 537-550. |
[94] | KUMAR G , DUKKIPATI N , JANG K ,et al. Swift:delay is simple and effective for congestion control in the datacenter[C]// Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications,Technologies,Architectures,and Protocols for Computer Communication. New York:ACM Press, 2020: 514-528. |
[95] | PERRY J , OUSTERHOUT A , BALAKRISHNAN H ,et al. Fastpass:a centralized “zero-queue” datacenter network[C]// Proceedings of the 2014 ACM conference on SIGCOMM. New York:ACM Press, 2014: 307-318. |
[96] | BEMTEN A , DERI? N , VARASTEH A ,et al. Chameleon:predictable latency and high utilization with queue-aware and adaptive source routing[C]// Proceedings of the 16th International Conference on emerging Networking Experiments and Technologies. New York:ACM Press, 2020: 451-465. |
[97] | LE Y , MYSORE R N , SURESH L ,et al. PL2:towards predictable low latency in rack-scale networks[J]. arXiv preprint, 2021,arXiv:2101.06537. |
[98] | WILSON C , BALLANI H , KARAGIANNIS T ,et al. Better never than late:meeting deadlines in datacenter networks[C]// Proceedings of the ACM SIGCOMM 2011 Conference. New York:ACM Press, 2011: 50-61. |
[99] | HONG C , CAESAR M , GODFREY P B . Finishing flows quickly with preemptive scheduling[J]. ACM SIGCOMM Computer Communication Review, 2012,42(4): 127-138. |
[100] | VAMANAN B , HASAN J , VIJAYKUMAR T N . Deadline-aware datacenter TCP (D2TCP)[J]. ACM SIGCOMM Computer Communication Review, 2012,42(4): 115-126. |
[101] | MUNIR A , QAZI I A , UZMI Z A ,et al. Minimizing flow completion times in data centers[C]// 2013 Proceedings IEEE INFOCOM. Piscataway:IEEE Press, 2013: 2157-2165. |
[102] | GROSVENOR M P , SCHWARZKOPF M , GOG I ,et al. Queues don’t matter when You can JUMP them![C]// Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2015: 1-14. |
[103] | CHEN L , CHEN K , BAI W ,et al. Scheduling mix-flows in commodity datacenters with Karuna[C]// Proceedings of the 2016 ACM SIGCOMM Conference. New York:ACM Press, 2016: 174-187. |
[104] | ZHANG Y W , KUMAR G , DUKKIPATI N ,et al. Aequitas:admission control for performance-critical RPCs in datacenters[C]// Proceedings of the ACM SIGCOMM 2022 Conference. New York:ACM Press, 2022: 1-18. |
[105] | BAI W , CHEN L , CHEN K ,et al. Information-agnostic flow scheduling for commodity data centers[C]// Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2015: 455-468. |
[106] | CHEN L , LINGYS J , CHEN K ,et al. Au TO:scaling deep reinforcement learning for datacenter-scale automatic traffic optimization[C]// Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. New York:ACM Press, 2018: 191-205. |
[107] | MUNIR A , BAIG G , IRTEZA S ,et al. Friends,not foes:synthesizing existing transport strategies for data center networks[J]. ACM SIGCOMM Computer Communication Review, 2014,44(4): 491-502. |
[108] | IEEE. Congestion notification:IEEE 802.11Qau[S]. 2010. |
[109] | GIBSON D , HARIHARAN H , LANCE E ,et al. Aquila:a unified,low-latency fabric for datacenter networks[C]// Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2022: 1249-1266. |
[1] | 胡腾,李观文,周华春. 面向服务的数据中心安全框架[J]. 电信科学, 2018, 34(1): 8-16. |
[2] | 马腾,胡宇翔. 一种数据中心网络虚拟机快速在线迁移算法[J]. 电信科学, 2017, 33(6): 64-72. |
[3] | 邢宁哲,吴舜,万莹,周亚东,胡成臣,赵泓博,刘伟昌. SDN技术在电力企业数据中心应用的探索与研究[J]. 电信科学, 2015, 31(5): 158-164. |
[4] | 樊自甫,伍春玲,王金红. 基于SDN架构的数据中心网络路由算法需求分析[J]. 电信科学, 2015, 31(2): 36-45. |
[5] | 李丹,刘方明,郭得科,何源,黄小猛. 软件定义的云数据中心网络基础理论与关键技术[J]. 电信科学, 2014, 30(6): 48-59. |
[6] | 罗萱,叶通,金耀辉. 云计算数据中心网络研究综述[J]. 电信科学, 2014, 30(2): 99-104. |
[7] | 刘宇昆,王晶,阮稳. 融合通信业务中XDM服务器高可用性的研究与实现[J]. 电信科学, 2009, 25(8): 41-46. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|