Yi Zhou

10.7MLApr 1, 2024

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Qi Zhang, Yi Zhou, Ashley Prater-Bennette et al.

Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $χ^2$-divergences as a special case. We prove that our algorithm finds an $ε$-stationary point with a computational complexity of $\mathcal O(ε^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

11.1OCSep 24, 2018

Asynchronous decentralized accelerated stochastic gradient descent

Guanghui Lan, Yi Zhou

In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks. We establish $\mathcal{O}(1/ε)$ (resp., $\mathcal{O}(1/\sqrtε)$) communication complexity and $\mathcal{O}(1/ε^2)$ (resp., $\mathcal{O}(1/ε)$) sampling complexity for solving general convex (resp., strongly convex) problems.

34.4OCJan 14, 2017

Communication-Efficient Algorithms for Decentralized and Stochastic Optimization

Guanghui Lan, Soomin Lee, Yi Zhou

We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. We first propose a decentralized primal-dual method which can find an $ε$-solution both in terms of functional optimality gap and feasibility residual in $O(1/ε)$ inter-node communication rounds when the objective functions are convex and the local primal subproblems are solved exactly. Our major contribution is to present a new class of decentralized primal-dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can still find an $ε$-solution in $O(1/ε)$ (resp., $O(1/\sqrtε)$) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the $O(1/ε^2)$ (resp., $O(1/ε)$) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization.

Yi Zhou

3 Papers