通信学报 ›› 2014, Vol. 35 ›› Issue (12): 196-202.doi: 10.3969/j.issn.1000-436x.2014.12.023

• 学术通信 • 上一篇    下一篇

面向互联网的大规模重复图像检索技术研究

王树鹏1,陈明2,吴广君1   

  1. 1 中国科学院 信息工程研究所,北京 100093
    2 郑州轻工业学院 软件学院,河南 郑州 450000
  • 出版日期:2014-12-25 发布日期:2017-06-17
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家高技术研究发展计划(863计划)基金资助项目;国家高技术研究发展计划(863计划)基金资助项目;国家高技术研究发展计划(863计划)基金资助项目;北京市科技计划基金资助项目

Large-scale duplicate image retrieval technical research for the internet

Shu-peng WANG1,Ming CHEN2,Guang-jun WU1   

  1. 1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    2 School of Software Engineering,Zhengzhou University of Light Industry,Zhengzhou 450000,China
  • Online:2014-12-25 Published:2017-06-17
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National High Technology Re-search and Development Program of China (863 Program);The National High Technology Re-search and Development Program of China (863 Program);The National High Technology Re-search and Development Program of China (863 Program);Beijing Municipal Science and Technology Project

摘要:

针对互联网上典型的社交媒体应用,提出了一个基于随机投影和分块 DCT 系数的大规模分布式重复图像检索方法。该方法在 Hadoop 集群的基础上,首先利用随机投影映射生成图像签名,再由图像签名高效的检索HBase 表以获得具有高召回率的候选图像集,最后依赖分块 DCT 系数对候选图像进行进一步过滤来提高检索精度。实验结果表明,对于1 200万张微博图像,当H =2且T=150时,该方法的召回率为98%,精确率为93.2%,平均检索时间为6.7 s。

关键词: 社交媒体, 随机投影映射, 图像签名, 分块DCT系数, Hadoop集群

Abstract:

For the typical social media application on the internet,a large-scale distributed duplicate image retrieval ap-proach based on random projection and the block DCT coefficients was proposed.On the basis of Hadoop,this approach exploited image signatures generated by random projection mapping to retrieve HBase efficiently.And candidate images with high-recall were achieved.Then in order to improve the retrieval precision,the block DCT coefficients were used to further filter candidate images.For 12 million images,experimental results showed that with our approach the recall ratio reached 98%,the precision ratio reached 93.2%,and the average retrieval time was 6.7s when H=2 and T=150.

Key words: social media, random projection mapping, image signature, block DCT coefficients, Hadoop cluster

[1] 刘伯涛. 移动回传的融合之路[J]. 电信科学, 2009, 25(11): 91 -93 .
[2] 王俊波,陈 明. 单业务TDD-CDMA系统上行用户容量分析[J]. 通信学报, 2007, 28(6): 8 -53 .
[3] 牛德华,马建峰,马卓,李辰楠,王蕾. 基于属性的安全增强云存储访问控制方案[J]. 通信学报, 2013, 34(Z1): 37 -284 .
[4] 刘 龙,宋琦军,赵太飞,元向辉. 基于运动矢量时-空特性的快速运动估计算法研究[J]. 通信学报, 2013, 34(1): 14 -127 .
[5] 王亚石,闵丽娟,周严. OSS/BSS一体化及其与ITSM的融合[J]. 电信科学, 2014, 30(6): 17 -23 .
[6] 杨春刚,盛敏,董延杰,李建东,李红艳,刘勤. 认知网络中基于网络辅助的速率控制方法[J]. 通信学报, 2013, 34(5): 15 -135 .
[7] 龚声蓉,郭 丽,韩 军,崔志明,刘 全. 基于全局运动补偿编码的AVS编码器设计[J]. 通信学报, 2007, 28(10): 16 -108 .
[8] 刘月平,姜秋喜,毕大平,崔 瑞. 网络雷达对Rician目标检测性能分析[J]. 通信学报, 2011, 32(10): 3 -26 .
[9] 郎非1,王保云1,2,邓志祥1. 基于分离信源信道码的相关信源在有噪广播信道下的可靠和安全传输[J]. 通信学报, 2013, 34(10): 3 -27 .
[10] 淦明,李辉,戴旭初. 基于协作中继的多元网络乘积码[J]. 通信学报, 2013, 34(6): 13 -113 .