OCCCDSLGMLApr 10, 2024

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

arXiv:2404.06720v11 citationsh-index: 6FOCS
Originality Highly original
AI Analysis

This work addresses the fundamental tradeoff between computational resources (memory and query complexity) for optimization algorithms, providing theoretical insights that are foundational for algorithm design in convex optimization and machine learning.

The paper tackles the feasibility problem by establishing oracle complexity lower bounds for memory-constrained algorithms using separation oracles, showing that gradient descent is Pareto-optimal in the tradeoff between oracle queries and memory usage, with deterministic algorithms requiring polynomial queries in 1/ε if memory is less than quadratic in d.

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $ε>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $ε\geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+δ}$ bits of memory or must make at least $1/(d^{0.01δ}ε^{2\frac{1-δ}{1+1.01 δ}-o(1)})$ oracle queries, for any $δ\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+δ}$ memory or make at least $1/(d^{2δ} ε^{2(1-4δ)-o(1)})$ queries for any $δ\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/ε)$ but makes $Ω(1/ε^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/ε$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/ε)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/ε)$ queries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes