Pablo A. Parrilo

OC
h-index67
10papers
329citations
Novelty61%
AI Score55

10 Papers

OCMay 29
Acceleration by Random Stepsizes: Hedging, Equalization, and the Arcsine Stepsize Schedule

Jason M. Altschuler, Pablo A. Parrilo

We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the convergence rate from $O(k)$ to $O(\sqrt{k})$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. Our starting point is a remarkable "equalization property" of the Arcsine distribution: it yields an identical convergence rate for all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.

OCNov 7, 2011
H_2-Optimal Decentralized Control over Posets: A State-Space Solution for State-Feedback

Parikshit Shah, Pablo A. Parrilo

We develop a complete state-space solution to H_2-optimal decentralized control of poset-causal systems with state-feedback. Our solution is based on the exploitation of a key separability property of the problem, that enables an efficient computation of the optimal controller by solving a small number of uncoupled standard Riccati equations. Our approach gives important insight into the structure of optimal controllers, such as controller degree bounds that depend on the structure of the poset. A novel element in our state-space characterization of the controller is a remarkable pair of transfer functions, that belong to the incidence algebra of the poset, are inverses of each other, and are intimately related to prediction of the state along the different paths on the poset. The results are illustrated by a numerical example.

OCFeb 27, 2014
Joint Spectral Radius and Path-Complete Graph Lyapunov Functions

Amir Ali Ahmadi, Raphaël Jungers, Pablo A. Parrilo et al.

We introduce the framework of path-complete graph Lyapunov functions for approximation of the joint spectral radius. The approach is based on the analysis of the underlying switched system via inequalities imposed among multiple Lyapunov functions associated to a labeled directed graph. Inspired by concepts in automata theory and symbolic dynamics, we define a class of graphs called path-complete graphs, and show that any such graph gives rise to a method for proving stability of the switched system. This enables us to derive several asymptotically tight hierarchies of semidefinite programming relaxations that unify and generalize many existing techniques such as common quadratic, common sum of squares, and maximum/minimum-of-quadratics Lyapunov functions. We compare the quality of approximation obtained by certain classes of path-complete graphs including a family of dual graphs and all path-complete graphs with two nodes on an alphabet of two matrices. We provide approximation guarantees for several families of path-complete graphs, such as the De Bruijn graphs, establishing as a byproduct a constructive converse Lyapunov theorem for maximum/minimum-of-quadratics Lyapunov functions.

OCDec 30, 2017
Sum of squares certificates for stability of planar, homogeneous, and switched systems

Amir Ali Ahmadi, Pablo A. Parrilo

We show that existence of a global polynomial Lyapunov function for a homogeneous polynomial vector field or a planar polynomial vector field (under a mild condition) implies existence of a polynomial Lyapunov function that is a sum of squares (sos) and that the negative of its derivative is also a sum of squares. This result is extended to show that such sos-based certificates of stability are guaranteed to exist for all stable switched linear systems. For this class of systems, we further show that if the derivative inequality of the Lyapunov function has an sos certificate, then the Lyapunov function itself is automatically a sum of squares. These converse results establish cases where semidefinite programming is guaranteed to succeed in finding proofs of Lyapunov inequalities. Finally, we demonstrate some merits of replacing the sos requirement on a polynomial Lyapunov function with an sos requirement on its top homogeneous component. In particular, we show that this is a weaker algebraic requirement in addition to being cheaper to impose computationally.

OCJan 16, 2012
When is a set of LMIs a sufficient condition for stability?

Amir Ali Ahmadi, Raphael M. Jungers, Pablo A. Parrilo et al.

We study stability criteria for discrete time switching systems. We investigate the structure of sets of LMIs that are a sufficient condition for stability (i.e., such that any switching system which satisfies these LMIs is stable). We provide an exact characterization of these sets. As a corollary, we show that it is PSPACE-complete to recognize whether a particular set of LMIs implies the stability of a switching system.

LGFeb 6
Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

Gagik Magakyan, Amirhossein Reisizadeh, Chanwoo Park et al.

Adaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such as celebrated LoRA facilitate efficient adaptation of large foundation models using labeled, high-quality and generally scarce task data. To mitigate data scarcity in fine-tuning of foundation models, we propose to leverage task similarity across multiple downstream users. Intuitively, users with similar tasks must be able to assist each other in boosting the effective fine-tuning data size. We propose Collaborative Low-Rank Adaptation, or CoLoRA, which exploits task similarity to collaboratively and efficiently fine-tune personalized foundation models. The main idea in CoLoRA is to train one shared adapter capturing underlying task similarities across all tasks, and personalized adapters tailored to user-specific tasks. We theoretically study CoLoRA on heterogeneous linear regression and provide provable guarantees for ground truth recovery. We also conduct several natural language experiments with varying task similarity, which further demonstrate that when trained together with similar tasks, individual performances are significantly boosted.

AIJun 5, 2025
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar et al.

Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups and susceptible to strategic manipulation. To address this issue, we develop a novel preference learning framework capable of aligning aggregate opinions and policies proportionally with the true population distribution of evaluator preferences. Grounded in social choice theory, our approach infers the feasible set of evaluator population distributions directly from pairwise comparison data. Using these estimates, the algorithm constructs a policy that satisfies foundational axioms from social choice theory, namely monotonicity and Pareto efficiency, as well as our newly-introduced axioms of population-proportional alignment and population-bounded manipulability. Moreover, we propose a soft-max relaxation method that smoothly trade-offs population-proportional alignment with the selection of the Condorcet winner (which beats all other options in pairwise comparisons). Finally, we validate the effectiveness and scalability of our approach through experiments on both tabular recommendation tasks and large language model alignment.

LGJun 4, 2021
Kernel approximation on algebraic varieties

Jason M. Altschuler, Pablo A. Parrilo

Low-rank approximation of kernels is a fundamental mathematical problem with widespread algorithmic applications. Often the kernel is restricted to an algebraic variety, e.g., in problems involving sparse or low-rank data. We show that significantly better approximations are obtainable in this setting: the rank required to achieve a given error depends on the variety's dimension rather than the ambient dimension, which is typically much larger. This is true in both high-precision and high-dimensional regimes. Our results are presented for smooth isotropic kernels, the predominant class of kernels used in applications. Our main technical insight is to approximate smooth kernels by polynomial kernels, and leverage two key properties of polynomial kernels that hold when they are restricted to a variety. First, their ranks decrease exponentially in the variety's co-dimension. Second, their maximum values are governed by their values over a small set of points. Together, our results provide a general approach for exploiting (approximate) "algebraic structure" in datasets in order to efficiently solve large-scale data science problems.

OCJul 12, 2018
Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs

Murat A. Erdogdu, Asuman Ozdaglar, Pablo A. Parrilo et al.

Semidefinite programming (SDP) with diagonal constraints arise in many optimization problems, such as Max-Cut, community detection and group synchronization. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro proposed to reduce the dimension of the problem by appealing to a low-rank factorization and solve the subsequent non-convex problem instead. In this paper, we present coordinate ascent based methods to solve this non-convex problem with provable convergence guarantees. More specifically, we prove that the block-coordinate maximization algorithm applied to the non-convex Burer-Monteiro method globally converges to a first-order stationary point with a sublinear rate without any assumptions on the problem. We further show that this algorithm converges linearly around a local maximum provided that the objective function exhibits quadratic decay. We establish that this condition generically holds when the rank of the factorization is sufficiently large. Furthermore, incorporating Lanczos method to the block-coordinate maximization, we propose an algorithm that is guaranteed to return a solution that provides $1-O(1/r)$ approximation to the original SDP without any assumptions, where $r$ is the rank of the factorization. This approximation ratio is known to be optimal (up to constants) under the unique games conjecture, and we can explicitly quantify the number of iterations to obtain such a solution.