Querying Kernel Methods Suffices for Reconstructing their Training Data
This work highlights privacy risks for users of kernel-based models, as it demonstrates a concrete attack method, though it is incremental in focusing on a specific model class.
The paper tackles the problem of training data memorization in over-parameterized kernel methods, showing that querying these models can reconstruct training data without accessing parameters, with empirical and theoretical validation across methods like kernel regression and SVMs.
Over-parameterized models have raised concerns about their potential to memorize training data, even when achieving strong generalization. The privacy implications of such memorization are generally unclear, particularly in scenarios where only model outputs are accessible. We study this question in the context of kernel methods, and demonstrate both empirically and theoretically that querying kernel models at various points suffices to reconstruct their training data, even without access to model parameters. Our results hold for a range of kernel methods, including kernel regression, support vector machines, and kernel density estimation. Our hope is that this work can illuminate potential privacy concerns for such models.