LGCRDBOct 19, 2016

K-Nearest Neighbor Classification Using Anatomized Data

arXiv:1610.06048v13 citations
Originality Incremental advance
AI Analysis

This addresses privacy-preserving machine learning for data analysts, but is incremental as it builds on existing anonymization techniques.

The paper tackles k-nearest neighbor classification with anatomized data, showing that learning from such anonymized data approaches the performance of using unprotected data, though requiring larger training sets, and outperforms generalization-based anonymization methods.

This paper analyzes k nearest neighbor classification with training data anonymized using anatomy. Anatomy preserves all data values, but introduces uncertainty in the mapping between identifying and sensitive values. We first study the theoretical effect of the anatomized training data on the k nearest neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian error. We then validate the derived bounds empirically. We show that 1) Learning from anatomized data approaches the limits of learning through the unprotected data (although requiring larger training data), and 2) nearest neighbor using anatomized data outperforms nearest neighbor on generalization-based anonymization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes