OC LG ST MLApr 4, 2025

Stochastic Optimization with Optimal Importance Sampling

Liviu Aolaritei, Bart P. G. Van Parys, Henry Lam, Michael I. Jordan

arXiv:2504.03560v19.42 citationsh-index: 8

Originality Highly original

AI Analysis

This work addresses a specific bottleneck in stochastic optimization for researchers and practitioners in fields like rare-event simulation, offering a novel solution to a known problem.

The paper tackles the challenge of applying Importance Sampling (IS) within stochastic optimization, where the decision and IS distribution are interdependent, by proposing an iterative gradient-based algorithm that jointly updates both without time-scale separation. The method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity and mild assumptions, with properties preserved under linear constraints.

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.

View on arXiv PDF

Similar