电信科学 ›› 2024, Vol. 40 ›› Issue (1): 35-47.doi: 10.11959/j.issn.1000-0801.2024015

• 研究与开发 • 上一篇    

基于深度学习的多层级恰可察觉失真预测

徐海峰1, 王鸿奎1, 殷海兵1, 陈楚翘2   

  1. 1 杭州电子科技大学通信工程学院,浙江 杭州 310018
    2 杭州电子科技大学网安学院,浙江 杭州 310018
  • 修回日期:2024-01-12 出版日期:2024-01-01 发布日期:2024-01-01
  • 作者简介:徐海峰(1999- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为感知视频编码
    王鸿奎(1990- ),男,博士,杭州电子科技大学通信工程学院讲师,主要研究方向为感知视频编码
    殷海兵(1974- ),男,博士,杭州电子科技大学通信工程学院教授,主要研究方向为数字视频编解码
    陈楚翘(1993- ),女,博士,杭州电子科技大学网安学院讲师,主要研究方向为智能信息处理
  • 基金资助:
    国家自然科学基金资助项目(62202134);国家自然科学基金资助项目(62031009);国家自然科学基金资助项目(61972123);科技部重点研发课题资助项目(2023YFB4502800);浙江省“尖兵”“领雁”研发攻关计划项目(2023C01149);浙江省“尖兵”“领雁”研发攻关计划项目(2022C01068);浙江省自然科学基金资助项目(LDT23F01014F01);浙江省自然科学基金资助项目(LDT23F01011F01)

Deep learning-based prediction of multi-level just noticeable distortion

Haifeng XU1, Hongkui WANG1, Haibing YIN1, Chuqiao CHEN2   

  1. 1 College of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
    2 College of Cyberspace Security, Hangzhou Dianzi University, Hangzhou 310018, China
  • Revised:2024-01-12 Online:2024-01-01 Published:2024-01-01
  • Supported by:
    The National Natural Science Foundation of China(62202134);The National Natural Science Foundation of China(62031009);The National Natural Science Foundation of China(61972123);Ministry of Science and Technology Key Research and Development Project Funding Program(2023YFB4502800);Zhejiang Provincial “Pioneer” and“Leading Goose” Research and Development Project(2023C01149);Zhejiang Provincial “Pioneer” and“Leading Goose” Research and Development Project(2022C01068);The Natural Science Foundation of Zhejiang Province(LDT23F01014F01);The Natural Science Foundation of Zhejiang Province(LDT23F01011F01)

摘要:

视觉恰可察觉失真(just noticeable distortion,JND)直接反映人眼视觉系统对视觉信号噪声的敏感程度,广泛应用于图像和视频处理领域。针对视频 JND 阈值的多层级预测问题,将其转化为用户满意率(satisfied user ratio,SUR)曲线的预测问题,并提出一种基于特征融合的SUR曲线预测模型。该模型主要分为关键帧选择模块、特征提取和融合模块以及SUR分数回归模块。在关键帧选择模块,根据视觉感知机制,提出空时域感知复杂度并以此作为视频关键帧判决指标。在特征提取和融合模块,基于密集残差块(dense residual block,RDB)提出多尺度密集残差网络实现图像特征提取和多尺度融合。实验结果表明,所提出的SUR曲线预测模型在JND阈值预测精度方面整体优于现有模型,且在运行效率上平均降低8.1%的时间成本。同时,该模型还可以用于预测其他层级JND阈值,可直接应用于视频多层级感知编码优化。

关键词: 恰可察觉失真, 深度学习, 质量评价

Abstract:

Visual just noticeable distortion (JND) directly reflects the sensitivity of the human visual system to visual signal noise, and is widely used in image and video processing.Aiming at the multilevel prediction problem of video JND threshold, it was transformed into the prediction problem of satisfied user ratio (SUR) curve, and a feature fusion-based SUR curve prediction model was proposed.The model was mainly divided into key frame extraction module, feature extraction and fusion module, and SUR score regression module.In the key frame extraction module, according to the visual perception mechanism, the spatial-temporal domain perception complexity was proposed and used as the video key frame judgment index.In the feature extraction and fusion module, a multi-scale dense residual network was proposed based on dense residual block (RDB) to realize image feature extraction and multi-scale fusion.The experimental results show that the proposed SUR curve prediction model is overall better than the existing models in terms of JND prediction accuracy and reduces the time cost by 8.1% on average in terms of operational efficiency.Meanwhile, the model can also be used to predict other layers of JND thresholds, which can be directly applied to video multilevel perceptual coding optimization.

Key words: just noticeable distortion, deep learning, quality evaluation

中图分类号: 

No Suggested Reading articles found!