通信学报 ›› 2017, Vol. 38 ›› Issue (Z1): 127-134.doi: 10.11959/j.issn.1000-436x.2017245

• 学术论文 • 上一篇    下一篇

FGBC-iDistance:细粒度位码过滤的高维索引

袁鑫攀1,汪灿飞1,龙军2,章成源2,满君丰1   

  1. 1 湖南工业大学计算机学院智能信息感知及处理技术湖南省重点实验室,湖南 株洲 412007
    2 中南大学信息科学与工程学院,湖南 长沙 410083
  • 出版日期:2017-10-01 发布日期:2018-06-07
  • 作者简介:袁鑫攀(1982-),男,湖南株洲人,博士,湖南工业大学讲师,主要研究方向为相似性度量和索引。|汪汕飞(1992-),男,安徽安庆人,湖南工业大学硕士生,主要研究方向为信息检索。|龙军(1972-),男,安徽安庆人,中南大学教授、博士生导师,主要研究方向为网络资源管理与可信评估。|章成源(1986-),男,湖南邵阳人,中南大学讲师,主要研究方向为空间数据库。|满君丰(1976-),男,黑龙江海伦人,博士,湖南工业大学教授,主要研究方向为可信软件。
  • 基金资助:
    国家自然科学基金资助项目(61402165);国家自然科学基金资助项目(S1651002);湖南省重点研发计划基金资助项目(2016JC2018);2016年湖南工业大学研究生校级创新基金资助项目(CX1606)

FGBC-iDistance:fine-grained bit-code filter based high-dimensional index

Xin-pan YUAN1,Can-fei WANG1,Jun LONG2,Cheng-yuan ZHANG2,Jun-feng MAN1   

  1. 1 Key Laboratory of Intelligent Information Perception and Processing Technology,School of Computer Science ,Hunan University of Technology,Zhuzhou 412007 ,China
    2 School of Information Science and Engineering,Central South University,Changsha 410083,China
  • Online:2017-10-01 Published:2018-06-07
  • Supported by:
    The National Natural Science Foundation of China(61402165);The National Natural Science Foundation of China(S1651002);The Key Research and Devel-opment Plan of Hunan Province(2016JC2018);The Graduate Student Innovation Plan of Hunan University of Technology(CX1606)

摘要:

在高维向量检索中,距离计算是很耗时的操作,当前科研趋势是采用分治法来减少距离计算。iDistance通过锚点将向量空间划分为聚类子空间,BC-iDistance通过BC码将聚类子空间每维划分成2个区域。提出一种更加细粒度的区域划分方法和索引结构,每个区域对应一个细粒度位码FGBC(fine grained bit code),通过FGBC码实现了对候选集更精准的过滤。FGBC-iDistance的距离计算次数最好能减少到iDistance的 1 2 2d ,在距离计算次数上,有 FGBC-iDistance≤BC-iDistance≤iDistance。实验结果表明当范围查询半径为 0.08 时,FGBC-iDistance的距离计算次数约为20 000次,远小于其他算法,运行时间也相应减少。

关键词: 距离计算, 细粒度, iDistance, 范围查询

Abstract:

In the high-dimensional vector retrieval,distance computation is a very time-consuming operation,the current research trend is to reduce the distance computation using divide and conquer algorithm.iDistance algorithm divides the vector space into subspace of clustering by pivot,BC-iDistance algorithm divides the subspace into 2 partitions in each dimension.A more fine-grained partition algorithm and index structure was proposed,each part corresponded with a unique FGBC code (fine-grained bit code),which realized the candidate sets filtered more precisely.The distance computation times of FGBC-iDistance can be reduced to 1 2 2d of iDistance.The distance computation frequency comparison:FGBC-iDistance≤BC-iDistance≤iDistance.The experiment results show that when the scope radius of the range query is 0.08,FGBC-iDistance distance computation times is about 20,000,which is far less than other algorithms,the running time is also reduced.

Key words: distance computation, more fine-grained, iDistance, range query

中图分类号: 

No Suggested Reading articles found!