LG AI MLMay 18, 2025

Importance Sampling for Nonlinear Models

Prakash Palanivelu Rajmohan, Fred Roosta

arXiv:2505.12353v14.1h-index: 1Has CodeICML

Originality Highly original

AI Analysis

This work addresses the problem of efficient data sampling and model interpretability for nonlinear models in machine learning, offering a novel approach with theoretical and experimental support.

The paper tackles the underdevelopment of importance sampling methods for nonlinear models by introducing the adjoint operator to generalize norm-based and leverage-score-based sampling, providing approximation guarantees similar to linear subspace embeddings and reducing computational complexity for training.

While norm-based and leverage-score-based methods have been extensively studied for identifying "important" data points in linear models, analogous tools for nonlinear models remain significantly underdeveloped. By introducing the concept of the adjoint operator of a nonlinear map, we address this gap and generalize norm-based and leverage-score-based importance sampling to nonlinear settings. We demonstrate that sampling based on these generalized notions of norm and leverage scores provides approximation guarantees for the underlying nonlinear mapping, similar to linear subspace embeddings. As direct applications, these nonlinear scores not only reduce the computational complexity of training nonlinear models by enabling efficient sampling over large datasets but also offer a novel mechanism for model explainability and outlier detection. Our contributions are supported by both theoretical analyses and experimental results across a variety of supervised learning scenarios.

View on arXiv PDF Code

Similar