Data Reconstruction: Identifiability and Optimization with Sample Splitting

Yujie Shen, Zihan Wang, Jian Qian, Qi Lei

arXiv:2602.08723v11.4h-index: 2

Originality Incremental advance

AI Analysis

This work provides theoretical insights and a practical method for data reconstruction, which is relevant for privacy and security in machine learning, though it appears incremental as it builds on existing reconstruction methods.

The paper tackles the problem of training data reconstruction from KKT conditions by addressing identifiability and optimization challenges, showing that sample splitting improves reconstruction performance in experiments.

Training data reconstruction from KKT conditions has shown striking empirical success, yet it remains unclear when the resulting KKT equations have unique solutions and, even in identifiable regimes, how to reliably recover solutions by optimization. This work hereby focuses on these two complementary questions: identifiability and optimization. On the identifiability side, we discuss the sufficient conditions for KKT system of two-layer networks with polynomial activations to uniquely determine the training data, providing a theoretical explanation of when and why reconstruction is possible. On the optimization side, we introduce sample splitting, a curvature-aware refinement step applicable to general reconstruction objectives (not limited to KKT-based formulations): it creates additional descent directions to escape poor stationary points and refine solutions. Experiments demonstrate that augmenting several existing reconstruction methods with sample splitting consistently improves reconstruction performance.

View on arXiv PDF

Similar