Soroosh Shafiee

OC
h-index31
10papers
85citations
Novelty54%
AI Score48

10 Papers

OCMar 7, 2023
Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

Soroosh Shafiee, Liviu Aolaritei, Florian Dörfler et al.

We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz regularization even if the transportation cost function fails to be (some power of) a metric. We also derive conditions for the existence and the computability of a Nash equilibrium between the decision-maker and nature, and we demonstrate numerically that nature's Nash strategy can be viewed as a distribution that is supported on remarkably deceptive adversarial samples. Finally, we identify practically relevant classes of optimal transport-based distributionally robust optimization problems that can be addressed with efficient gradient descent algorithms even if the loss function or the transportation cost function are nonconvex (but not both at the same time).

OCMar 31, 2023
Constrained Optimization of Rank-One Functions with Indicator Variables

Soroosh Shafiee, Fatma Kılınç-Karzan

Optimization problems involving minimization of a rank-one convex function over constraints modeling restrictions on the support of the decision variables emerge in various machine learning applications. These problems are often modeled with indicator variables for identifying the support of the continuous variables. In this paper we investigate compact extended formulations for such problems through perspective reformulation techniques. In contrast to the majority of previous work that relies on support function arguments and disjunctive programming techniques to provide convex hull results, we propose a constructive approach that exploits a hidden conic structure induced by perspective functions. To this end, we first establish a convex hull result for a general conic mixed-binary set in which each conic constraint involves a linear function of independent continuous variables and a set of binary variables. We then demonstrate that extended representations of sets associated with epigraphs of rank-one convex functions over constraints modeling indicator relations naturally admit such a conic representation. This enables us to systematically give perspective formulations for the convex hull descriptions of these sets with nonlinear separable or non-separable objective functions, sign constraints on continuous variables, and combinatorial constraints on indicator variables. We illustrate the efficacy of our results on sparse nonnegative logistic regression problems.

29.3OCApr 10
A Saddle Point Algorithm for Robust Data-Driven Factor Model Problems

Shabnam Khodakaramzadeh, Soroosh Shafiee, Gabriel de Albuquerque Gleizer et al.

We study the factor model problem, which aims to uncover low-dimensional structures in high-dimensional datasets. Adopting a robust data-driven approach, we formulate the problem as a saddle-point optimization. Our primary contribution is a first-order algorithm that solves this reformulation by leveraging a linear minimization oracle (LMO). We further develop semi-closed form solutions (up to a scalar) for three specific LMOs, corresponding to the Frobenius norm, Kullback-Leibler divergence, and Gelbrich (aka Wasserstein) distance. The analysis includes explicit quantification of these LMOs' regularity conditions, notably the Lipschitz constants of the dual function, which govern the algorithm's convergence performance. Numerical experiments confirm our method's effectiveness in high-dimensional settings, outperforming standard off-the-shelf optimization solvers.

MLJun 27, 2022
Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Liviu Aolaritei, Soroosh Shafiee, Florian Dörfler

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.

LGFeb 23
Wasserstein Distributionally Robust Online Learning

Guixian Chen, Salar Fattahi, Soroosh Shafiee

We study distributionally robust online learning, where a risk-averse learner updates decisions sequentially to guard against worst-case distributions drawn from a Wasserstein ambiguity set centered at past observations. While this paradigm is well understood in the offline setting through Wasserstein Distributionally Robust Optimization (DRO), its online extension poses significant challenges in both convergence and computation. In this paper, we address these challenges. First, we formulate the problem as an online saddle-point stochastic game between a decision maker and an adversary selecting worst-case distributions, and propose a general framework that converges to a robust Nash equilibrium coinciding with the solution of the corresponding offline Wasserstein DRO problem. Second, we address the main computational bottleneck, which is the repeated solution of worst-case expectation problems. For the important class of piecewise concave loss functions, we propose a tailored algorithm that exploits problem geometry to achieve substantial speedups over state-of-the-art solvers such as Gurobi. The key insight is a novel connection between the worst-case expectation problem, an inherently infinite-dimensional optimization problem, and a classical and tractable budget allocation problem, which is of independent interest.

OCMar 1
GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs

Jiachang Liu, Andrea Lodi, Soroosh Shafiee

We investigate the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through a cardinality constraint. While Branch-and-Bound (BnB) frameworks can certify optimality using perspective relaxations, existing methods for solving these relaxations are computationally intensive, limiting their scalability. To address this challenge, we reformulate the relaxations as composite optimization problems and develop a unified proximal framework that is both linearly convergent and computationally efficient. Under specific geometric regularity conditions, our analysis links primal quadratic growth to dual quadratic decay, yielding error bounds that make the Fenchel duality gap a sharp proxy for progress towards the solution set. This leads to a duality gap-based restart scheme that upgrades a broad class of sublinear proximal methods to provably linearly convergent methods, and applies beyond the sparse GLM setting. For the implicit perspective regularizer, we further derive specialized routines to evaluate the regularizer and its proximal operator exactly in log-linear time, avoiding costly generic conic solvers. The resulting iterations are dominated by matrix--vector multiplications, which enables GPU acceleration. Experiments on synthetic and real-world datasets show orders-of-magnitude faster dual-bound computations and substantially improved BnB scalability on large instances.

MLNov 9, 2023
Outlier-Robust Wasserstein DRO

Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee

Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails to account for non-geometric perturbations such as adversarial outliers, which can greatly distort the Wasserstein distance measurement and impede the learned model. We address this gap by proposing a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (total variation (TV)) contamination that allows an $\varepsilon$-fraction of data to be arbitrarily corrupted. We design an uncertainty set using a certain robust Wasserstein ball that accounts for both perturbation types and derive minimax optimal excess risk bounds for this procedure that explicitly capture the Wasserstein and TV risks. We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem. When the loss function depends only on low-dimensional features of the data, we eliminate certain dimension dependencies from the risk bounds that are unavoidable in the general setting. Finally, we present experiments validating our theory on standard regression and classification tasks.

LGFeb 13, 2025
Scalable First-order Method for Certifying Optimal k-Sparse GLMs

Jiachang Liu, Soroosh Shafiee, Andrea Lodi

This paper investigates the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through an $\ell_0$ cardinality constraint. While branch-and-bound (BnB) frameworks can certify optimality by pruning nodes using dual bounds, existing methods for computing these bounds are either computationally intensive or exhibit slow convergence, limiting their scalability to large-scale problems. To address this challenge, we propose a first-order proximal gradient algorithm designed to solve the perspective relaxation of the problem within a BnB framework. Specifically, we formulate the relaxed problem as a composite optimization problem and demonstrate that the proximal operator of the non-smooth component can be computed exactly in log-linear time complexity, eliminating the need to solve a computationally expensive second-order cone program. Furthermore, we introduce a simple restart strategy that enhances convergence speed while maintaining low per-iteration complexity. Extensive experiments on synthetic and real-world datasets show that our approach significantly accelerates dual bound computations and is highly effective in providing optimality certificates for large-scale problems.

OCNov 4, 2024
Distributionally Robust Optimization

Daniel Kuhn, Soroosh Shafiee, Wolfram Wiesemann

Distributionally robust optimization (DRO) studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain. A key component of any DRO model is its ambiguity set, that is, a family of probability distributions consistent with any available structural or statistical information. DRO seeks decisions that perform best under the worst distribution in the ambiguity set. This worst case criterion is supported by findings in psychology and neuroscience, which indicate that many decision-makers have a low tolerance for distributional ambiguity. DRO is rooted in statistics, operations research and control theory, and recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning. This survey presents the key findings of the field in a unified and self-contained manner.

LGJun 10, 2024
Robust Distribution Learning with Local and Global Adversarial Corruptions

Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee

We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $ρ$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(Π_\# \hat{P}_n, Π_\# P)$ for all orthogonal projections $Π\in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(Π) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + ρ+ \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $ρ= \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.