基于强化学习的实时视频流控与移动终端训练方法研究

doi:10.11959/j.issn.2096-3750.2022.00306

Abstract

Abstract:

Service platforms centered on the Internet of things and mobile Internet are in accelerating process.Hundreds of millions of end-users communicate through network real-time video services, which have become an irreplaceable core tool in human’s digital life.However, the Internet is becoming dynamic, and heterogeneous, which imposes stringent requirements on real-time video streaming control technology.Moreover, the QoE of real-time video is not satisfactory.An adaptive reinforcement learning-based video intelligent transmission algorithm was designed, which can deal with heterogeneous network environment.And then, an effective end-to-end on-device training framework was designed to decrease server overhead, and a detailed evaluation and analysis on the neural network design and structure was provided.Experimental results show that the proposed algorithm can effectively predict heterogeneous network bandwidth, and reduces the bandwidth prediction error by 48.48%, comparing with the representative streaming control algorithm.The effective bandwidth prediction can further improve the user QoE, such as improving the video fluency by 60.65%, and improving the video quality by 16.52%.Besides, the analysis can provide empirical insights for further study, and holds potential to push the development of intelligent video applications.

Key words: real-time video, adaptive streaming control, quality-of-experience, reinforcement learning, on-device training

CLC Number:

TP393

Huanhuan ZHANG, Anfu ZHOU, Huadong MA. Reinforcement learning-based real-time video streaming control and on-device training research[J]. Chinese Journal on Internet of Things, 2022, 6(4): 1-13.

Figures/Tables 17

References 31

[1]	LUO J G , ZHANG M , ZHAO L ,et al. A large-scale live video streaming system based on P2P networks[J]. Journal of Software, 2006,18(2): 391-399.
[2]	FENG D G , XU J , LAN X . Study on 5G mobile communication network security[J]. Journal of Software, 2018,29(6): 1813-1825.
[3]	Cisco visual networking index:forecast and trends[EB]. 2019.
[4]	HA S , RHEE I , XU L . CUBIC:a new TCP-friendly high-speed TCP variant[J]. Operating Systems Review, 2008,42(5): 64-74.
[5]	CARLUCCI G , DE CICCOL , HOLMER S ,et al. Congestion control for web real-time communication[J]. IEEE/ACM Transactions on Networking, 2017,25(5): 2629-2642.
[6]	NEAL C , YUCHUNG C , STEPHEN G ,et al. BBR:congestion-based congestion control[J]. Communications of the ACM, 2017,60(2): 58-66.
[7]	MAO H , NETRAVALI R , ALIZADEH M . Neural adaptive video streaming with pensieve[C]// ACM Special Interest Group on Data Communication (SIGCOMM) 2017. Los Angeles:ACM Press, 2017: 197-210.
[8]	ZHOU A F , ZHANG H H , SU G Y ,et al. Learning to coordinate video codec with transport protocol for mobile video telephony[C]// Proceedings of the 25th Annual International Conference on Mobile Computing and Networking (MobiCom) 2019. Los Cabos:[s.n], 2019: 21-25.
[9]	ZHANG H H , ZHOU A F , LU J M ,et al. OnRL:improving mobile video telephony via online reinforcement learning[C]// Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom) 2020. London:[s.n], 2020: 1-14.
[10]	YAN F Y , HUDSON A , ZHU C Z ,et al. Learning in situ:a randomized experiment in video streaming[C]// Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Santa Clara:[s.n], 2020: 495-511.
[11]	JACOBSON V , . Congestion avoidance and control[C]// Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM). Stanford:[s.n], 1988: 314-329.
[12]	BRAAKMO L S , O’MALLEY S W , PETERSON L L . TCP vegas:new techniques for congestion detection and avoidance[C]// Proceed ings of the ACM Special Interest Group on Data Communication (SIGCOMM). London:[s.n], 1994: 24-35.
[13]	DONG M , LI Q , ZARCHY D ,et al. PCC:re-architecting congestion control for consistent high performance[C]// Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Oakland:[s.n], 2015: 395-408.
[14]	DONG M , MENG T , ZARCHY D ,et al. PCC vivace:online-learning congestion control[C]// Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Renton:[s.n], 2018: 343-356.
[15]	XU Q , MEHROTRA S , MAO Z M ,et al. PROTEUS:network performance forecast for real-time,interactive mobile applications[C]// Proceedings of the 11th Annual International Conference on Mobile Systems,Applications,and Services (MobiSys). Taipei:[s.n], 2013: 347-360.
[16]	Web RTC homepage[EB]. 2018.
[17]	FOULADI S , EMMONS J , ORBAY E ,et al. Salsify:low-latency network video through tighter integration between a video codec and a transport protocol[C]// Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation,(NSDI), Renton:[s.n], 2018: 267-282.
[18]	WINSTEIN K , BALAKRISHNAN H . TCP ex-machina:computer-generated congestion control[C]// Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM). Hong Kong:[s.n], 2013: 123-134.
[19]	FRANCIS YY , MA J , HILL G D ,et al. Pantheon:the training ground for Internet congestion-control research[C]// Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC). Boston:[s.n], 2018: 731-743.
[20]	HUANG T Y , JOHARI R , MCKEOWN N ,et al. A buffer-based approach to rate adaptation:evidence from a large video streaming service[C]// Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM).[S.l.:s.n], 2014: 187-198.
[21]	SPITERI K , URGAONKAR R , SIATRAMAN R K . BOLA:near-optimal bit rate adaptation for online videos[J]. IEEE/ACM Transactions on Networking, 2020,28(4): 1698-1711.
[22]	JIANG J C , SEKAR V , ZHANG H . Improving fairness,efficiency,and stability in HTTP-based adaptive video streaming with festive[C]// Proceedings of IEEE/ACM Transactions on Networking. Piscataway:IEEE Press, 2012: 326-340.
[23]	ZHANG H , ZHOU A , MA H . Improving mobile interactive video QoE via two-level online cooperative learning[J]. IEEE Transactions on Mobile Computing, 2022,Early Access.
[24]	刘克 . 实用马尔可夫决策过程[M]. 清华大学出版社, 2004.
	LIU K . Applied Markov decision processes[M]. Beijing: Tsinghua University Press, 2004.
[25]	范长杰 . 基于马尔可夫决策理论的规划问题的研究[D]. 中国科学技术大学, 2008.
	FAN C J . Research on planning problem based on Markov decision theory[D]. Hefei:University of Science and Technology of China, 2008.
[26]	ZHANG H , ZHOU A , MA R ,et al. Arsenal:understanding learning-based wireless video transport via in-depth evaluation[J]. IEEE Transactions on Vehicular Technology, 2021,70(10): 10832-10844.
[27]	SUN Y , YIN X , JIANG J ,et al. CS2P:improving video bitrate selection and adaptation with data-driven throughput prediction[C]// Pro ceedings of the ACM Special Interest Group on Data Communication (SIGCOMM). New York:ACM Press, 2016: 272-285.
[28]	HU Y X , LI D , SUN P H ,et al. Polymorphic smart network:an open,flexible and universal architecture for future heterogeneous networks[J]. IEEE Transactions on Network Science and Engineering, 2020,7(4): 2515-2525.
[29]	SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[EB]. 2017.
[30]	Dillon J V , LANGMORE I , Tran D ,et al. Tensorflow distributions[J]. arXiv preprint. 2017,arXiv:1711.10604.
[31]	SMILKOV D , THORAT N , ASSOGBA Y ,et al. Tensorflow.js:machine learning for the web and beyond[J]. Proceedings of Machine Learning and Systems, 2019: 309-321.

Metrics

Recommended 0

No Suggested Reading articles found!

性质	指标	传统流媒体视频，如视频点播	新型实时视频，如视频直播
传输特性	视频内容	提前生成，存储在资源服务器	实时生成、实时观看
	传输级别	块级别传输	数据帧/包级别细粒度传输
	缓存大小	2～10 s	100～300 ms
视频用户体验要求	卡顿要求	较低	较高
	视频画质要求	较高	相对较低

英文名称	中文描述	设定最优值
Network architecture	神经网络结构类型	全连接层
Fusion manner	输入状态融合方式	先融合
Activation function	激活函数	LeakyReLu
Batch size	批处理大小	32

框架名称	来源	发布年份	主要功能	特性
Core-ML	苹果公司	2017年	使用简单；只支持端上推理，经常用于有监督习任务	较适用于苹果设备，只支持端上推理；在安卓设备的性能未知
ML-Kit	谷歌公司	2018年	同时支持苹果和安卓设备，并且可以在两个平台上使用相同的API	具有6个基本API，易于实现，但只支持有监督学习任务，不支持强化学习的训练
Paddle-mobile	百度公司	2019年	部署灵活，支持多硬件	只支持模型的端上推理，不支持模型的端上训练
Caffe2	Facebook公司	2017年	同时覆盖训练和推理的通用框架；支持云端深度神经网络的训练	只支持模型的端上推理，不支持模型的端上训练
TensorFlow Lite	谷歌公司	2018年	TensorFlow 在移动终端上运行深度学习算法的平台；内存占用较低	只支持模型的端上轻量级推理，不支持模型的端上训练
TensorFlow.js	谷歌公司	2018年	TensorFlow 的 JS 平台；灵活，可较好的与Web 交互；同时支持苹果和安卓设备；支持强化学习的端上训练任务	同时支持移动终端的推理与训练任务；需要将机器学习相关的低代码重构为JS
MNN	阿里巴巴公司	2019年	轻量级的深度学习端侧推理引擎；同时支持苹果和安卓设备	只支持模型的端上推理，不支持模型的端上训练

主要参数	PPO算法	GCC算法	性能提升百分比
视频卡顿率	0.353%	0.897%	-60.650%
视频清晰度/(Mbit·s^-1)	1.340	1.150	+16.520%
网络带宽估计误差/(Mbit·s^-1)	0.152	0.297	-48.480%

主要参数	全连接层	卷积层
视频延迟/ms	9.11	9.37
视频清晰度/(Mbit·s^-1)	1.22	1.20
视频平滑度/(Mbit·s^-1)	0.089	0.081

Reinforcement learning-based real-time video streaming control and on-device training research

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 31

Related Articles 15

Metrics

Recommended 0

[1]	Zhihong WANG, Supeng LENG, Kai XIONG. Multi-agent resource allocation strategy for UAV swarm-based cooperative sensing [J]. Chinese Journal on Internet of Things, 2023, 7(1): 18-26.
[2]	Cenhuishan LIAO, Junyan CHEN, Guanping LIANG, Xiaolan XIE, Xiaoye LU. Quality of service optimization algorithm based on deep reinforcement learning in software defined network [J]. Chinese Journal on Internet of Things, 2023, 7(1): 73-82.
[3]	Biao ZHANG, Ximing WANG, Yifan XU, Wen LI, Hao HAN, Songyi LIU, Xueqiang CHEN. Multi-domain collaborative anti-jamming based on multi-agent deep reinforcement learning [J]. Chinese Journal on Internet of Things, 2022, 6(4): 104-116.
[4]	Xian LI, Suzhi BI, Hongru ZENG, Bin LIN, Xiaohui LIN. Collaborative task offloading and resource allocation optimization for intelligent edge devices [J]. Chinese Journal on Internet of Things, 2022, 6(4): 41-52.
[5]	Jiujiu CHEN, Caili GUO, Chunyan FENG, Chuanhong LIU. Resource allocation for the semantic communication in the intelligent networked environment [J]. Chinese Journal on Internet of Things, 2022, 6(3): 47-57.
[6]	Hanqing YU, Yan LIN, Linqiong JIA, Qiang LI, Yijin Zhang. A distributed strategy for the multi-target rescue using a UAV swarm under communication constraints [J]. Chinese Journal on Internet of Things, 2022, 6(3): 103-112.
[7]	Zihui LUO, Chengling JIANG, Liang LIU, Xiaolong ZHENG, Huadong MA. Research on deep reinforcement learning based intelligent shop scheduling method [J]. Chinese Journal on Internet of Things, 2022, 6(1): 53-64.
[8]	Haibo MEI, Kun YANG, Xinyu FAN. Deep reinforcement learning to enhance the energy-efficient performance of UAV-enabled F-RAN [J]. Chinese Journal on Internet of Things, 2021, 5(2): 48-59.
[9]	Chunmin LIN, Liekang ZENG, Xu CHEN. Research on power efficient autonomous UAV navigation algorithm: an edge intelligence driven approach [J]. Chinese Journal on Internet of Things, 2021, 5(2): 87-96.
[10]	Xuemin(Sherman) SHEN,Nan CHENG,Haibo ZHOU,Feng LYU,Wei QUAN,Weisen SHI,Huaqing WU,Conghao ZHOU. Space-air-ground integrated networks:review and prospect [J]. Chinese Journal on Internet of Things, 2020, 4(3): 3-19.
[11]	Yi ZHOU,Xiaoyong MA,Fuxiao GAO,Wei LI,Nan CHENG,Ning LU. Autonomous deployment and energy efficiency optimization strategy of UAV based on deep reinforcement learning [J]. Chinese Journal on Internet of Things, 2019, 3(2): 47-55.
[12]	Ruijin DING,Feifei GAO,Ling XING. Intelligent routing strategy in the Internet of things based on deep reinforcement learning [J]. Chinese Journal on Internet of Things, 2019, 3(2): 56-63.
[13]	Longyu ZHOU,Ning YANG,Guanhua QIAO,Ke ZHANG,Qilin ZHENG. Energy efficiency priority IoT task collaborative migration strategy [J]. Chinese Journal on Internet of Things, 2019, 3(2): 64-71.
[14]	Guanhua QIAO,Supeng LENG,Hao LIU,Kaisheng HUANG,Fan WU. Task collaborative offloading scheme in vehicle multi-access edge computing network [J]. Chinese Journal on Internet of Things, 2019, 3(1): 51-59.
[15]	Luyao WANG,Wenqian ZHANG,Guanglin ZHANG. Research on energy management of multi-user mobile edge computing offloading [J]. Chinese Journal on Internet of Things, 2019, 3(1): 73-81.