LGITDec 4, 2023

Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors

arXiv:2312.01991v44 citationsh-index: 4MLSP
Originality Incremental advance
AI Analysis

This addresses a limitation in a widely used algorithm for classification and regression, offering a method to enhance performance in scenarios with noise, imbalanced data, and skewed distributions, but it is incremental as it modifies an existing approach.

The paper tackles the problem of K-Nearest Neighbors treating all samples equally by proposing Information-Modified KNN (IM-KNN), which uses Mutual Information and Shapley values to assign weighted values to neighbors, resulting in average improvements of 16.80% in accuracy, 17.08% in precision, and 16.98% in recall across 12 benchmark datasets.

The K-Nearest Neighbors (KNN) algorithm is widely used for classification and regression; however, it suffers from limitations, including the equal treatment of all samples. We propose Information-Modified KNN (IM-KNN), a novel approach that leverages Mutual Information ($I$) and Shapley values to assign weighted values to neighbors, thereby bridging the gap in treating all samples with the same value and weight. On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80%, 17.08%, and 16.98%, respectively, across 12 benchmark datasets. Experiments on four large-scale datasets further highlight IM-KNN's robustness to noise, imbalanced data, and skewed distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes