Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection
This work provides a novel method for outlier detection, potentially benefiting machine learning practitioners, but it is incremental as it builds on existing concepts from algebraic topology.
The paper recasts the weighting vector from metric space magnitude as a kernelized SVM solution, applying it to outlier detection and achieving competitive or superior performance on benchmark datasets. It also demonstrates that the weighting vector can be efficiently approximated in linear time under mild assumptions.
Metric space magnitude, an active field of research in algebraic topology, is a scalar quantity that summarizes the effective number of distinct points that live in a general metric space. The {\em weighting vector} is a closely-related concept that captures, in a nontrivial way, much of the underlying geometry of the original metric space. Recent work has demonstrated that when the metric space is Euclidean, the weighting vector serves as an effective tool for boundary detection. We recast this result and show the weighting vector may be viewed as a solution to a kernelized SVM. As one consequence, we apply this new insight to the task of outlier detection, and we demonstrate performance that is competitive or exceeds performance of state-of-the-art techniques on benchmark data sets. Under mild assumptions, we show the weighting vector, which has computational cost of matrix inversion, can be efficiently approximated in linear time. We show how nearest neighbor methods can approximate solutions to the minimization problems defined by SVMs.