Pablo Parrilo

OC
3papers
110citations
Novelty43%
AI Score43

3 Papers

88.7LGJun 2Code
Spectral Scaling Laws of Muon

Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar

Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be orthonormalized. In Muon, NS is applied to the momentum matrix at every step, yet little is known about how the singular value spectrum of these momentum matrices behaves during training, or how that behavior changes with model size. We present the first systematic study of this question. Tracking singular value quantiles of the momentum buffer across layers in models ranging from 77M to 2.8B parameters, we observe a consistent picture: after a short burn-in, the quantiles stabilize at a value determined by the layer type and model size. These stabilization values follow remarkably clean power laws in model size, with layer-dependent exponents. Layers up to mid-late depth scale very mildly with model size $M$ (around $M^{-0.25}$), so the standard 5-step NS configuration used at academic scale will continue to orthonormalize them at much larger scales. Some of the late layers, however, scale much more aggressively (up to $M^{-0.96}$) and will fall into the NS failure regime at frontier scale unless one uses more NS iterations or better-tuned coefficients. NS iterations are computationally expensive at scale; our laws give practitioners a principled, layer-aware recipe for choosing the minimum NS configuration that still orthonormalizes the directions that matter -- avoiding unnecessary computation without sacrificing update quality.

OCDec 27, 2021
SOSTOOLS Version 4.00 Sum of Squares Optimization Toolbox for MATLAB

Antonis Papachristodoulou, James Anderson, Giorgio Valmorbida et al.

The release of SOSTOOLS v4.00 comes as we approach the 20th anniversary of the original release of SOSTOOLS v1.00 back in April, 2002. SOSTOOLS was originally envisioned as a flexible tool for parsing and solving polynomial optimization problems, using the SOS tightening of polynomial positivity constraints, and capable of adapting to the ever-evolving fauna of applications of SOS. There are now a variety of SOS programming parsers beyond SOSTOOLS, including YALMIP, Gloptipoly, SumOfSquares, and others. We hope SOSTOOLS remains the most intuitive, robust and adaptable toolbox for SOS programming. Recent progress in Semidefinite programming has opened up new possibilities for solving large Sum of Squares programming problems, and we hope the next decade will be one where SOS methods will find wide application in different areas. In SOSTOOLS v4.00, we implement a parsing approach that reduces the computational and memory requirements of the parser below that of the SDP solver itself. We have re-developed the internal structure of our polynomial decision variables. Specifically, polynomial and SOS variable declarations made using sossosvar, sospolyvar, sosmatrixvar, etc now return a new polynomial structure, dpvar. This new polynomial structure, is documented in the enclosed dpvar guide, and isolates the scalar SDP decision variables in the SOS program from the independent variables used to construct the SOS program. As a result, the complexity of the parser scales almost linearly in the number of decision variables. As a result of these changes, almost all users will notice a significant increase in speed, with large-scaleproblems experiencing the most dramatic speedups. Parsing time is now always less than 10% of time spent in the SDP solver. Finally, SOSTOOLS now provides support for the MOSEK solver interface as well as the SeDuMi, SDPT3, CSDP, SDPNAL, SDPNAL+, and SDPA solvers.

OCNov 30, 2011
An Optimal Controller Architecture for Poset-Causal Systems

Parikshit Shah, Pablo Parrilo

We propose a novel and natural architecture for decentralized control that is applicable whenever the underlying system has the structure of a partially ordered set (poset). This controller architecture is based on the concept of Moebius inversion for posets, and enjoys simple and appealing separation properties, since the closed-loop dynamics can be analyzed in terms of decoupled subsystems. The controller structure provides rich and interesting connections between concepts from order theory such as Moebius inversion and control-theoretic concepts such as state prediction, correction, and separability. In addition, using our earlier results on H_2-optimal decentralized control for arbitrary posets, we prove that the H_2-optimal controller in fact possesses the proposed structure, thereby establishing the optimality of the new controller architecture.