基于列表监督的Hash排序算法

doi:10.11959/j.issn.1000-0801.2019072

电信科学 ›› 2019, Vol. 35 ›› Issue (5): 78-85.doi: 10.11959/j.issn.1000-0801.2019072

基于列表监督的Hash排序算法

杨安邦,钱江波(),董一鸿,陈华辉

宁波大学信息科学与工程学院，浙江宁波 315211

修回日期:2019-04-12 出版日期:2019-05-20 发布日期:2019-05-21
作者简介:杨安邦（1992- ），男，宁波大学信息科学与工程学院硕士生，主要研究方向为大数据、数据挖掘。|钱江波（1974- ），男，博士，宁波大学信息科学与工程学院教授，主要研究方向为数据处理与挖掘、多维索引与查询优化、机器学习。|董一鸿（1969- ），男，博士，宁波大学信息科学与工程学院教授，主要研究方向为大数据、数据挖掘和人工智能。|陈华辉（1964- ），男，博士，宁波大学信息科学与工程学院教授，主要研究方向为数据处理与挖掘、云计算。
基金资助:
国家自然科学基金资助项目(61472194);国家自然科学基金资助项目(61572266);浙江省自然科学基金资助项目(LY16F020003)

A ranking hashing algorithm based on listwise supervision

Anbang YANG,Jiangbo QIAN(),Yihong DONG,Huahui CHEN

College of Information Science and Engineering,Ningbo University,Ningbo 315211,China

Revised:2019-04-12 Online:2019-05-20 Published:2019-05-21
Supported by:
The National Natural Science Foundation of China(61472194);The National Natural Science Foundation of China(61572266);The Natural Science Foundation of Zhejiang Province of China(LY16F020003)

摘要/Abstract

摘要：

Hash学习技术目前被广泛应用于大规模数据的相似性查找中，其通过将数据转化成二进制编码的形式，同时提高查找速度和降低存储代价。目前，大多数Hash排序算法通过比较数据在欧氏空间和海明空间的排序一致性来构造损失函数，然而，在海明空间的排序过程中，因为海明距离是离散的整数值，可能存在多个数据点共享相同的海明距离，这样就无法准确地排序。针对这一问题，将编码后的数据切分成几个长度相同的子空间，并为每个子空间设置不同的权重，比较时，再根据不同的子空间权重来计算海明距离。实验结果表明，与其他Hash学习算法相比，本文算法能够有效地对海明空间中的数据进行排序，并提高查询的准确性。

关键词: Hash学习, 相似性查找, Hash排序, 子空间权重

Abstract:

Recently,learning to hash technology has been used for the similarity search of large-scale data.It can simultaneous increase the search speed and reduce the storage cost through transforming the data into binary codes.At present,most ranking hashing algorithms compare the consistency of data in the Euclidean space and the Hamming space to construct the loss function.However,because the Hamming distance is a discrete integer value,there may be many data points sharing the same Hamming distance result in the exact ranking cannot be performed.To address this challenging issue,the encoded data was divided into several subspaces with the same length.Each subspace was set with different weights.The Hamming distance was calculated according to different subspace weights.The experimental results show that this algorithm can effectively sort the data in the Hamming space and improve the accuracy of the query compared with other learning to hash algorithms.

Key words: learning to hash, similarity search, ranking hashing, subspaces with different weights

中图分类号:

TP391

杨安邦,钱江波,董一鸿,陈华辉. 基于列表监督的Hash排序算法[J]. 电信科学, 2019, 35(5): 78-85.

Anbang YANG,Jiangbo QIAN,Yihong DONG,Huahui CHEN. A ranking hashing algorithm based on listwise supervision[J]. Telecommunications Science, 2019, 35(5): 78-85.

图/表 5

参考文献 25

[1]	GUTTMAN A . R-trees:adynamic index structure for spatial searching[J]. ACM Sigmod Record, 1984,14(2): 47-57.
[2]	SONGD , LIU W , JI R . Top rank supervised binary coding for visual search[C]// 2015 IEEE International Conference on Computer Vision,Dec 13-16,2015,Sontiago,Chile. Piscataway:IEEE Press, 2015.
[3]	PARK Y , CAFARELLA M , MOZAFARI B . Neighbor-sensitive Hashing[M]. New York: ACM PressPress, 2016.
[4]	WANG J , ZHANG T , SONG J ,et al. A survey on learning to hash[J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2016(99):1.
[5]	DATAR M , IMMORLICA N , INDYK P ,et al. Locality-sensitive hashing scheme based on p-stable distributions[C]// Twentieth Symposium on Computational Gemetry,Jun 8-11,2004,New York,USA. New York:ACM Press, 2004:253.
[6]	WEISS Y , TORRALBA A , FERGUS R . Spectral Hashing[C]// Neural Information Processing System,Dec 8-11,2008,British Columbia,Canada. New York:ACM Press, 2008.
[7]	ZHANGD , WANG J , CAI D ,et al. Self-taught hashing for fast similarity search[C]// 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval,July 19-23,2010,IGO,Geneva Switzerland. New York:ACM Press, 2010.
[8]	GONG Y , LAZEBINK S , GORDO A ,et al. Iterative quantization:a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,135(12): 2916-2919.
[9]	LIN R S , ROSS D A , YAGNIK J . Spec hashing:similarity preserving algorithm for entropy-based coding[C]// Computer Vision and Pattern Recognition,Jun 13-18,2010,San Francisco,CA,USA.[S.l.:sn]. 2010.
[10]	NOROUZI M E , FLEETD J . Minimal loss hashing for compact binary codes[C]// The 28th International Conference on Machine Learning,June 28-July 2,2011,Washington,USA. Piscataway:IEEE Press, 2011.
[11]	CHANG S F , JIANG Y G , JI R ,et al. Supervised hashing with kernels[C]// Computer Vision and Pattern Recognition,June 16-21,2012,Providence,RI,USA. Piscataway:IEEE Press, 2012.
[12]	SHEN F , SHEN C , LIU W . Supervised discrete hashing[C]// Computer Vision and Pattern Recognition,June 8-10,2015,Boston,USA. Piscataway:IEEE Press, 2015.
[13]	STRECHA C , BRONSTEIN A , BRONSTEIN M ,et al. LDAHash:improved matching with smaller descriptors[J]. IEEE Transactions actions on Pattern Analysis and Machine Intelligence, 2011,34(1): 66-78.
[14]	WANG J , KUMA R S , CHANG S F . Semi-supervised hashing for large-scale search[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012,34(12): 2393-2406.
[15]	LIN G , SHEN C , WU J . Optimizing ranking measures for compact binary code learning[Z]. 2014.
[16]	SONG D , LIU W , MEYER D ,et al. Rank preserving hashing for rapid image search[C]// 2015 Data Compression Conference,April 7-9,2015,Snowbird,UT,USA. Piscataway:IEEE Press, 2015.
[17]	JIANG Y , WANG J , XUE X . Query-adaptive image search with hash codes[J]. IEEE Transactions on Multimedia, 2013,15(2): 442-453.
[18]	SHUM H Y , . QsRank:query-sensitive hash code ranking for efficient ε-nighbor search[C]// IEEE Conference on and Pattern Recognition,June 16-21,2012,Providence,RI,USA. Piscataway:IEEE Press, 2012.
[19]	JI T , LIU X , DENG C ,et al. Query-adaptive hash code ranking for fast nearest neighbor search[C]// ACM International Conference on Multimedia,Dec 3-7,2014,Orlando,FL,USA. New York:ACM Press, 2014.
[20]	LI X , LIN G , SHEN C ,et al. Learning hash functions using column generation[J]. Computer Science, 2013: 142-150.
[21]	NOROUZI M , FLEET D J , SALAKHUTDINOV R . Hamming distance metric learning[C]// International Conference on Neural Information Processing System,Dec 3-6,2012,Lake Tahoe,Nevada. New York:ACM Press, 2012: 1061-1069.
[22]	WANG J , LIU W , SUN A X ,et al. Learning hash codes with listwise supervision[C]// The 2013 IEEE International Conference on Computer Vision,Dec 1-8,2013,Sydney,Australia. Piscataway:IEEE Press, 2013: 3032-3039.
[23]	WANG Q , ZHANG Z , SI L . Ranking preserving hashing for fast similiarity search[C]// IJCAI,July 25-31,2015,Buenos Aires,Argentina.[S.l.:s. n], 2015.
[24]	YAO T , LONG F , MEI T ,et al. Deep semantic preserving and ranking-based hashing for image retrieval[C]// IJCAI,July 9-15,2016,New York,USA.[S.l.:sn]. 2016.
[25]	LIU L , SHAO L , SHEN F ,et al. Discretely coding semantic rank orders for supervised image hashing[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition,July 21-26,2017,Honolulu,HI,USA. Piscataway:IEEE Press, 2017.

基于列表监督的Hash排序算法

A ranking hashing algorithm based on listwise supervision

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 25

相关文章 15

Metrics

推荐阅读 0

[1]	金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95.
[2]	马辉, 王瑞琴, 杨帅. 一种渐进式增长条件生成对抗网络模型[J]. 电信科学, 2023, 39(6): 105-113.
[3]	卢敏, 胡娟, 张先超, 丁伟健, 乐光学. 基于用户多特征融合的个性化推荐模型[J]. 电信科学, 2023, 39(5): 101-115.
[4]	张永, 刘纪奎, 柯文龙. 基于并行可分离卷积和标签平滑正则化的脑电情感识别[J]. 电信科学, 2023, 39(5): 116-128.
[5]	邓琨, 蒋庆丰, 刘星妍. 融合节点分析与边分析的复杂网络社区识别算法[J]. 电信科学, 2023, 39(4): 87-100.
[6]	冶莉娟, 王亦婷, 朱励程. 基于细胞自动机模型电力网络攻击预测技术[J]. 电信科学, 2023, 39(4): 173-179.
[7]	韩一士, 徐雨欣, 卢甜甜. 一种基于耦合网络的RD-IHSAT网络谣言传播模型[J]. 电信科学, 2023, 39(2): 118-131.
[8]	徐嘉, 简志华, 金宏辉, 吴超, 游林, 吴迎笑. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023, 39(1): 72-78.
[9]	任华健, 郝秀兰, 徐稳静. 融合递增词汇选择的深度学习中文输入法[J]. 电信科学, 2022, 38(12): 56-64.
[10]	周薇娜, 刘露. 复杂场景下多尺度船舶实时检测方法[J]. 电信科学, 2022, 38(10): 67-78.
[11]	金楠, 王瑞琴, 陆悦聪. 基于艾宾浩斯遗忘曲线和注意力机制的推荐算法[J]. 电信科学, 2022, 38(10): 89-97.
[12]	杨帅, 王瑞琴, 马辉. 基于多通道的边学习图卷积网络[J]. 电信科学, 2022, 38(9): 95-104.
[13]	赵东明. 电信运营商知识图谱技术体系研究及应用实践[J]. 电信科学, 2022, 38(8): 151-162.
[14]	于佳祺, 简志华, 徐嘉, 游林, 汪云路, 吴超. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022, 38(6): 91-99.
[15]	申情, 郭文宾, 楼俊钢, 余强国. 考虑多层次潜在特征的个性化推荐模型[J]. 电信科学, 2022, 38(2): 71-83.