Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities
This work addresses a fundamental statistical estimation problem for researchers in machine learning and statistics, providing rigorous sample complexity bounds for a widely used estimator in high-dimensional density estimation.
The paper tackles the problem of learning multivariate log-concave densities by providing the first finite sample upper bound on the sample complexity of the maximum likelihood estimator for dimensions d ≥ 4, showing it requires Õ_d((1/ε)^{(d+3)/2}) samples to achieve ε-close accuracy in squared Hellinger loss, which is near-optimal up to an Õ(1/ε) factor compared to a known lower bound of Ω_d((1/ε)^{(d+1)/2}).
We study the problem of learning multivariate log-concave densities with respect to a global loss function. We obtain the first upper bound on the sample complexity of the maximum likelihood estimator (MLE) for a log-concave density on $\mathbb{R}^d$, for all $d \geq 4$. Prior to this work, no finite sample upper bound was known for this estimator in more than $3$ dimensions. In more detail, we prove that for any $d \geq 1$ and $ε>0$, given $\tilde{O}_d((1/ε)^{(d+3)/2})$ samples drawn from an unknown log-concave density $f_0$ on $\mathbb{R}^d$, the MLE outputs a hypothesis $h$ that with high probability is $ε$-close to $f_0$, in squared Hellinger loss. A sample complexity lower bound of $Ω_d((1/ε)^{(d+1)/2})$ was previously known for any learning algorithm that achieves this guarantee. We thus establish that the sample complexity of the log-concave MLE is near-optimal, up to an $\tilde{O}(1/ε)$ factor.