电信科学 ›› 2015, Vol. 31 ›› Issue (7): 63-74.doi: 10.11959/j.issn.1000-0801.2015190

• 研究与开发 • 上一篇    下一篇

一种自适应子空间相似性搜索方法

任建新,陈华辉   

  1. 宁波大学信息科学与工程学院 宁波 315211
  • 出版日期:2015-08-21 发布日期:2015-08-21

An Adaptive Subspace Similarity Search Approach

Jianxin Ren,Huahui Chen   

  1. College of Information Science and Engineering,Ningbo University,Ningbo 315211,China
  • Online:2015-08-21 Published:2015-08-21

摘要:

近年来,在多媒体信息检索、相似性连接和时间序列匹配等数据库领域的相似搜索研究备受关注。绝大部分工作都是在欧式空间条件下,使用度量距离函数计算最近邻(如 kNN、kNNJ)来解决搜索目标集合问题。但已有研究表明,此条件下的搜索结果准确性很容易受到高差异维度的影响,且对应的解决方案尚缺乏灵活性和顽健性。首先提出了单机环境下动态子空间(部分维度)下相似搜索问题及解决方案。随着数据规模的扩大,单机算法不能很好地扩展,随之又提出了Hadoop框架下的分布式算法。实验证实,在不影响准确率的情况下,分布式算法的性能要优于集中式算法。

关键词: 自适应子空间, 相似性搜索, 非度量距离方法, MapReduce分布式计算框架

Abstract:

In recent years,such database fields as multimedia information retrieval,similarity join and time series matching,where similarity search has attracted much attention.Existing researches mostly compute nearest neighbor to solve problems about search target set,such as kNN and kNNJ,by metric distance functions in the Euclidean space.But some studies showed that high dissimilarity dimensions had got great effect on the accuracy of answer and flexibility and robustness still were lacked in corresponding solutions.Thus centralized dynamic subspace or partial dimensions similarity search problem and algorithms were proposed at first.Furthermore,with the emerge of very large dataset,centralized algorithms can,t extend very well.Finally,the distributed ones under hadoop framework were proposed.Experiments prove that distributed algorithms outperform centralized ones in the performance without accuracy loss.

Key words: adaptive subspace, similarity search, non-metric distance method, MapReduce distributed computing framework

No Suggested Reading articles found!