Statistical Inference with Local Optima
This work addresses statistical inference challenges for practitioners using gradient-based optimization in multi-modal settings, though it appears incremental as it builds on existing methods for confidence intervals and initialization analysis.
The paper tackles the problem of statistical inference when maximum likelihood estimation is performed using gradient ascent with multiple random initializations on multi-modal likelihood functions, deriving the population target of such estimators and analyzing coverage deficiencies in confidence intervals due to finite initializations. It shows that different test-based confidence intervals can vary significantly and proposes a two-sample test procedure even when the MLE is intractable.
We study the statistical properties of an estimator derived by applying a gradient ascent method with multiple initializations to a multi-modal likelihood function. We derive the population quantity that is the target of this estimator and study the properties of confidence intervals (CIs) constructed from asymptotic normality and the bootstrap approach. In particular, we analyze the coverage deficiency due to finite number of random initializations. We also investigate the CIs by inverting the likelihood ratio test, the score test, and the Wald test, and we show that the resulting CIs may be very different. We propose a two-sample test procedure even when the MLE is intractable. In addition, we analyze the performance of the EM algorithm under random initializations and derive the coverage of a CI with a finite number of initializations.