Big Data Research ›› 2024, Vol. 10 ›› Issue (1): 17-34.doi: 10.11959/j.issn.2096-0271.2024010

• STUDY • Previous Articles    

Approximate nearest neighbor hybrid query algorithm based on tolerance factor

Guangfu HE1,2, Yuanhai XUE1, Cuiting CHEN1,2, Xiaoming YU1,2, Xinran LIU1,3, Xueqi CHENG1,2   

  1. 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2 University of Chinese Academy of Sciences, Beijing 101408, China
    3 Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2024-01-01 Published:2024-01-01
  • Supported by:
    The National Natural Science Foundation of China(U21B2046)

Abstract:

Approximate nearest neighbor search (ANNS) is an important technique in the field of computer science for efficient similarity search, enabling fast information retrieval in large-scale datasets.With the increasing demand for high-precision information retrieval, there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying, which has broad application prospects.However, filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries, ultimately damaging search accuracy.This article proposes a tolerance factor based filtered greedy algorithm, which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor, maintaining the connectivity of the original nearest neighbor graph without changing the index structure, and overcoming the negative impact of structural constraints on retrieval accuracy.The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints, while maintaining retrieval efficiency.This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios, providing an effective solution for fast hybrid query information retrieval in large-scale datasets.

Key words: hybrid query, vector search, nearest neighbor search, filtered search

CLC Number: 

No Suggested Reading articles found!