Online Nearest Neighbor Search Using Hamming...

Online Nearest Neighbor Search Using Hamming Weight Trees

Abstract

Nearest neighbor search is a basic and recurring proximity problem that has been studied for several decades. The goal is to preprocess a dataset of points so that we can quickly report the closet point(s) to any query point. Many recent applications of NNS involve datasets that are very large and dynamic, that is items of data items become available gradually. In this study, we propose a data structure for solving NNS for dynamic binary data where both query and dataset items are represented as binary strings. The proposed tree data structure, called the Hamming Weight Tree, is simple and as the names suggests, is based on partitioning the feature space of binary strings by exploiting the Hamming weights of the binary codes and their substrings. Given a Hamming Weight Tree of binary codes, we propose two search algorithms that accommodate nearest neighbor search for two different distance functions, the Hamming distance and the angular distance. Our empirical results show significant speedup in comparison with the best known large-scale solutions.