Ben Claydon

h-index4
2papers

2 Papers

9.5CGMay 22
Dynamic Query Modification for Binary Locality Sensitive Hashing

Ben Claydon, Richard Connor, Alan Dearle

Our context of interest is how binary locality sensitive hash (LSH) functions can be used to solve the approximate near neighbour (ANN) problem, which seeks to find the k closest elements of some dataset X to some further point q presented as a query. Binary locality sensitive function families H are sets of functions each which accept a point and return a binary value. A function is locality sensitive if and only if the output of the function is more likely to be equal (a 'hash collision') if two close vectors are used as input than if two far vectors are used. A data structure can be built by generating binary hash codes for each member of X, which are generated by drawing and applying one or more functions from H. When q is presented as a query, the same set of functions is applied to it and those elements of X with equal binary hash codes are retrieved. In this paper we introduce dynamic query modification. This process changes q at query time to form a new value c, which by theoretical and experimental analysis we prove has two significant advantages. Firstly, the hash output of c collides with near neighbours with a greater probability than q. Secondly, we show there is little chance of c failing to collide with any near neighbours; a property which we demonstrate is not true for q. To demonstrate the efficacy of the technique, we define a novel structure MQ-Forest, a modified version of RP-Forest. Both are binary LSH-based ANN mechanisms, but MQ-Forest dynamically estimates a value for c during the query process. We show that MQ-Forest reduces both build and query times by up to 40% when measured over several large, high-dimensional benchmark datasets.

LGMay 31, 2025
Ultra-Quantisation: Efficient Embedding Search via 1.58-bit Encodings

Richard Connor, Alan Dearle, Ben Claydon

Many modern search domains comprise high-dimensional vectors of floating point numbers derived from neural networks, in the form of embeddings. Typical embeddings range in size from hundreds to thousands of dimensions, making the size of the embeddings, and the speed of comparison, a significant issue. Quantisation is a class of mechanism which replaces the floating point values with a smaller representation, for example a short integer. This gives an approximation of the embedding space in return for a smaller data representation and a faster comparison function. Here we take this idea almost to its extreme: we show how vectors of arbitrary-precision floating point values can be replaced by vectors whose elements are drawn from the set {-1,0,1}. This yields very significant savings in space and metric evaluation cost, while maintaining a strong correlation for similarity measurements. This is achieved by way of a class of convex polytopes which exist in the high-dimensional space. In this article we give an outline description of these objects, and show how they can be used for the basis of such radical quantisation while maintaining a surprising degree of accuracy.