电信科学 ›› 2015, Vol. 31 ›› Issue (7): 52-62.doi: 10.11959/j.issn.1000-0801.2015171

• 研究与开发 • 上一篇    下一篇

一种高维大数据全k近邻查询算法

王忠伟,陈叶芳,肖四友,钱江波   

  1. 宁波大学信息科学与工程学院 宁波 315211
  • 出版日期:2015-08-21 发布日期:2015-08-21
  • 基金资助:
    国家自然科学基金资助项目;浙江省自然科学基金资助项目;宁波市自然科学基金资助项目;宁波市自然科学基金资助项目;“信息与通信工程”浙江省重中之重学科开放基金资助项目

An AkNN Algorithm for High-Dimensional Big Data

Zhongwei Wang,Yefang Chen,Siyou Xiao,Jiangbo Qian   

  1. Faculty of Electrical and Computer Science,Ningbo University,Ningbo 315211,China
  • Online:2015-08-21 Published:2015-08-21
  • Supported by:
    The National Natural Science Foundation of China;Zhejiang Provincial Natural Science Foundation;Ningbo Natural Science Foundation;Ningbo Natural Science Foundation;Zhejiang Key Discipline Fund of Information and Communication Engineering

摘要:

全k近邻(all k-nearest neighbor,AkNN)查询,是k近邻查询的一个变型,旨在在一个查询过程中为给定数据集的每个对象确定k个最近邻。提出了一种在Hadoop分布式平台下处理高维大数据的AkNN查询算法。首先使用行条化思想结合p-stable LSH算法将高维数据对象降维,然后结合空间填充曲线Z-order的优良特性,把降维后的数据嵌入一维空间中,接着进行范围查询。整个过程使用MapReduce框架分布式并行处理。实验结果表明,所提出的算法可以高效处理高维大数据的AkNN查询。

关键词: 高维, AkNN, MapReduce, 行条化, 局部敏感散列, Z-order

Abstract:

A new variant of k nearest neighbor queries,which called as all k-nearest neighbor queries(AkNN),is a process to search the k nearest neighbors of each object in a data set.An AkNN query algorithm for high-dimensional big data on the Hadoop system was proposed.Using the banding technique and the p-stable LSH algorithm,dimensionality reduction was performed,then the data was embeded in a Z-order curve.The preprocessed data were continued to be treated on a MapReduce framework in a distributed parallel manner.Experimental results show that the proposed algorithm can efficiently handle AkNN queries for large-scale high-dimensional data.

Key words: high-dimensional, AkNN, MapReduce, banding, locality sensitive hashing, Z-order

No Suggested Reading articles found!