Fast kernel half-space depth for data with non-convex supports
This work addresses the problem of data depth for non-convex, multimodal distributions in statistics and machine learning, offering a faster and more flexible method for tasks like anomaly detection and testing, though it is incremental as it builds on existing halfspace depth concepts.
The paper tackles the limitations of halfspace depth, which assumes convex data supports and is computationally expensive, by extending it to a Reproducing Kernel Hilbert Space (RKHS) to handle non-convex, multimodal distributions. The result is a depth function that is intuitive, consistent with provable bounds, and can be computed orders of magnitude faster than halfspace depth, as demonstrated in simulations and real-world applications like anomaly detection and homogeneity testing.
Data depth is a statistical function that generalizes order and quantiles to the multivariate setting and beyond, with applications spanning over descriptive and visual statistics, anomaly detection, testing, etc. The celebrated halfspace depth exploits data geometry via an optimization program to deliver properties of invariances, robustness, and non-parametricity. Nevertheless, it implicitly assumes convex data supports and requires exponential computational cost. To tackle distribution's multimodality, we extend the halfspace depth in a Reproducing Kernel Hilbert Space (RKHS). We show that the obtained depth is intuitive and establish its consistency with provable concentration bounds that allow for homogeneity testing. The proposed depth can be computed using manifold gradient making faster than halfspace depth by several orders of magnitude. The performance of our depth is demonstrated through numerical simulations as well as applications such as anomaly detection on real data and homogeneity testing.