ProHD: Projection-Based Hausdorff Distance Approximation
This enables practical HD calculations in applications like large vector databases and streaming data, providing a faster and more accurate alternative to existing approximations.
The paper tackled the problem of efficiently computing the Hausdorff distance (HD) for large, high-dimensional datasets by proposing ProHD, a projection-guided approximation algorithm that runs 10–100× faster than exact methods while maintaining accuracy within a few percent of the true value.
The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose \textbf{ProHD}, a projection-guided approximation algorithm that dramatically accelerates HD computation while maintaining high accuracy. ProHD identifies a small subset of candidate "extreme" points by projecting the data onto a few informative directions (such as the centroid axis and top principal components) and computing the HD on this subset. This approach guarantees an underestimate of the true HD with a bounded additive error and typically achieves results within a few percent of the exact value. In extensive experiments on image, physics, and synthetic datasets (up to two million points in $D=256$), ProHD runs 10--100$\times$ faster than exact algorithms while attaining 5--20$\times$ lower error than random sampling-based approximations. Our method enables practical HD calculations in scenarios like large vector databases and streaming data, where quick and reliable set distance estimation is needed.