Distribution-Informed Adaptation for kNN Graph Construction
This work addresses a domain-specific problem for machine learning practitioners using kNN graphs, offering an incremental improvement by adapting k-values based on data distribution.
The paper tackles the problem of conventional kNN graphs using a unified k-value, which hinders performance on ambiguous samples along decision boundaries, by proposing DaNNG, a distribution-informed adaptive method that improves accuracy and generalization, as shown through evaluations on benchmark datasets where it outperforms state-of-the-art algorithms.
Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Distribution-Informed adaptive kNN Graph (DaNNG), which combines adaptive kNN with distribution-aware graph construction. By incorporating an approximation of the distribution with customized k-adaption criteria, DaNNG can significantly improve performance on ambiguous samples, and hence enhance overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, DaNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.