Big Data Research ›› 2021, Vol. 7 ›› Issue (6): 67-77.doi: 10.11959/j.issn.2096-0271.2021061

• STUDY • Previous Articles     Next Articles

Algorithm of locality sensitive hashing bit selection based on feature selection

Wenhua ZHOU, Huawen LIU, Enhui LI   

  1. College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321001, China
  • Online:2021-11-15 Published:2021-11-01
  • Supported by:
    The National Natural Science Foundation of China(61976195)

Abstract:

Locality sensitive hashing is one of the most popular information retrieval methods, which needs to generate long hashing bits to meet the retrieval requirement.However, a long hashing bits requires huge storage space, and contains plenty of redundant hashing bits.In order to solve this problem, ten simple and efficient selection algorithms in feature engineering were adopted to extract the hashing bits which carry the largest amount of information from the long hashing bits which were generated by locality sensitive hashing, and the redundant and useless hash bits were removed.Those ten algorithms tried to capture the performance of each hashing bit or the correlation among bits, such as variance and hamming distance.During selection process, the useless or high-correlated hashing bits were removed.Then the selected hashing bits were compared with the original long hashing bits.The experimental results on four common datasets show that the selected hashing bits works as well as the original hashing bits, and their reduction ratio can reach from 30% to 70%.

Key words: approximate nearest neighbor search, hashing learning, hashing bit selection, feature selection, dimensionality reduction

CLC Number: 

No Suggested Reading articles found!