SOC-PHApr 4, 2018
Attack vulnerability of power systems under an equal load redistribution modelTalha Cihad Gulcu, Vaggos Chatziafratis, Yingrui Zhang et al.
This paper studies the vulnerability of flow networks against adversarial attacks. In particular, consider a power system (or, any system carrying a physical flow) consisting of $N$ transmission lines with initial loads $L_1, \ldots , L_N$ and capacities $C_1, \ldots, C_N$, respectively; the capacity $C_i$ defines the maximum flow allowed on line $i$. Under an equal load redistribution model, where load of failed lines is redistributed equally among all remaining lines, we study the {\em optimization} problem of finding the best $k$ lines to attack so as to minimize the number of {\em alive} lines at the steady-state (i.e., when cascades stop). This is done to reveal the worst-case attack vulnerability of the system as well as to reveal its most vulnerable lines. We derive optimal attack strategies in several special cases of load-capacity distributions that are practically relevant. We then consider a modified optimization problem where the adversary is also constrained by the {\em total} load (in addition to the number) of the initial attack set, and prove that this problem is NP-Hard. Finally, we develop heuristic algorithms for selecting the attack set for both the original and modified problems. Through extensive simulations, we show that these heuristics outperform benchmark algorithms under a wide range of settings.
GTAug 3, 2022
Efficiently Computing Nash Equilibria in Adversarial Team Markov GamesFivos Kalogiannis, Ioannis Anagnostides, Ioannis Panageas et al.
Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $ε$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/ε$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).
29.8DSApr 19
Optimal Phylogenetic Reconstruction from Sampled QuartetsDionysis Arvanitakis, Vaggos Chatziafratis, Yiyuan Luo et al.
Quartet Reconstruction, the task of recovering a phylogenetic tree from smaller trees on four species called \textit{quartets}, is a well-studied problem in theoretical computer science with far-reaching connections to statistics, graph theory and biology. Given a random sample containing $m$ noisy quartets, labeled by an unknown ground-truth tree $T$ on $n$ taxa, we want to output a tree $\widehat T$ that is \textit{close} to $T$ in terms of quartet distance and can predict unseen quartets. Unfortunately, the empirical risk minimizer corresponds to the $\mathsf{NP}$-hard problem of finding a tree that maximizes agreements with the sampled quartets, and earlier works in approximation algorithms gave $(1-\eps)$-approximation schemes (PTAS) for dense instances with $m=Θ(n^4)$ quartets, or for $m=Θ(n^2\log n)$ quartets \textit{randomly} sampled from $T$. Prior to our work, it was unknown how many samples are information-theoretically required to learn the tree, and whether there is an efficient reconstruction algorithm. We present optimal results for reconstructing an unknown phylogenetic tree $T$ from a random sample of $m=Θ(n)$ quartets, corrupted under the Random Classification Noise (RCN) model. This matches the $Ω(n)$ lower bound required for any meaningful tree reconstruction. Our contribution is twofold: first, we give a tree reconstruction algorithm that, not only achieves a $(1-\eps)$-approximation, but most importantly \textit{recovers} a tree close to $T$ in quartet distance; second, we show a new $Θ(n)$ bound on the Natarajan dimension of phylogenies (an analog of VC dimension in multiclass classification). Our analysis relies on a new \textit{Quartet-based Embedding and Detection} procedure that identifies and removes well-clustered subtrees from the (unknown) ground-truth $T$ via semidefinite programming.
LGOct 11, 2022
On Scrambling Phenomena for Randomly Initialized Recurrent NetworksVaggos Chatziafratis, Ioannis Panageas, Clayton Sanford et al.
Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics, and their sensitivity to the initialization process often renders them notoriously hard to train. Recent works have shed light on such phenomena analyzing when exploding or vanishing gradients may occur, either of which is detrimental for training dynamics. In this paper, we point to a formal connection between RNNs and chaotic dynamical systems and prove a qualitatively stronger phenomenon about RNNs than what exploding gradients seem to suggest. Our main result proves that under standard initialization (e.g., He, Xavier etc.), RNNs will exhibit \textit{Li-Yorke chaos} with \textit{constant} probability \textit{independent} of the network's width. This explains the experimentally observed phenomenon of \textit{scrambling}, under which trajectories of nearby points may appear to be arbitrarily close during some timesteps, yet will be far away in future timesteps. In stark contrast to their feedforward counterparts, we show that chaotic behavior in RNNs is preserved under small perturbations and that their expressive power remains exponential in the number of feedback iterations. Our technical arguments rely on viewing RNNs as random walks under non-linear activations, and studying the existence of certain types of higher-order fixed points called \textit{periodic points} that lead to phase transitions from order to chaos.
40.8DSMay 5
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality MismatchDionysis Arvanitakis, Vaggos Chatziafratis, Yiyuan Luo
Embedding-based representations in Euclidean space $\mathbb{R}^d$ are a cornerstone of modern machine learning, where a major goal is to use the \emph{smallest dimension} that faithfully captures data relations. In this work, we prove sharp dimension--accuracy tradeoffs and identify a fundamental information-theoretic limitation: unless the embedding dimension $d$ is chosen close to the ground-truth dimension $D$, accuracy undergoes a sudden collapse. Our main result shows that this phenomenon arises even in standard contrastive learning settings, where supervision is limited to a set of $m$ anchor--positive--negative triplets $(i,j,k)$ encoding distance comparisons $\mathrm{dist}(i,j) < \mathrm{dist}(i,k)$. Specifically, given triplets realizable by an unknown ground-truth embedding in $D$ dimensions, we prove that there exists constant $c < 1$, such that \emph{every embedding of dimension at most $cD$ violates half of the triplets}, yielding accuracy as low as a trivial one-dimensional solution that ignores the input. We complement our information-theoretic bounds with strong computational hardness results: under the Unique Games Conjecture, even if the given triplets are nearly realizable in $D=1$ dimension, no polynomial-time algorithm -- \textit{regardless of its dimension} -- can achieve accuracy above the trivial $50\%$ baseline.
LGFeb 22, 2024
From Large to Small Datasets: Size Generalization for Clustering Algorithm SelectionVaggos Chatziafratis, Ishani Karmarkar, Ellen Vitercik
In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an unknown ground-truth clustering that we can only access through expensive oracle queries. Ideally, the clustering algorithm's output will be structurally close to the ground truth. We approach this problem by introducing a notion of size generalization for clustering algorithm accuracy. We identify conditions under which we can (1) subsample the massive clustering instance, (2) evaluate a set of candidate algorithms on the smaller instance, and (3) guarantee that the algorithm with the best accuracy on the small instance will have the best accuracy on the original big instance. We provide theoretical size generalization guarantees for three classic clustering algorithms: single-linkage, k-means++, and (a smoothed variant of) Gonzalez's k-centers heuristic. We validate our theoretical analysis with empirical results, observing that on real-world clustering instances, we can use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
LGMar 7
Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse ProblemsSean Gunn, Jorio Cocola, Oliver De Candido et al.
Generative models have emerged as powerful priors for solving inverse problems. These models typically represent a class of natural signals using a single fixed complexity or dimensionality. This can be limiting: depending on the problem, a fixed complexity may result in high representation error if too small, or overfitting to noise if too large. We develop tunable-complexity priors for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout. Across tasks including compressed sensing, inpainting, denoising, and phase retrieval, we show empirically that tunable priors consistently achieve lower reconstruction errors than fixed-complexity baselines. In the linear denoising setting, we provide a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure. This work demonstrates the potential of tunable-complexity generative priors and motivates both the development of supporting theory and their application across a wide range of inverse problems.
LGSep 21, 2025
The Complexity of Finding Local Optima in Contrastive LearningJingming Yan, Yiyuan Luo, Vaggos Chatziafratis et al.
Contrastive learning is a powerful technique for discovering meaningful data representations by optimizing objectives based on $\textit{contrastive information}$, often given as a set of weighted triplets $\{(x_i, y_i^+, z_{i}^-)\}_{i = 1}^m$ indicating that an "anchor" $x_i$ is more similar to a "positive" example $y_i$ than to a "negative" example $z_i$. The goal is to find representations (e.g., embeddings in $\mathbb{R}^d$ or a tree metric) where anchors are placed closer to positive than to negative examples. While finding $\textit{global}$ optima of contrastive objectives is $\mathsf{NP}$-hard, the complexity of finding $\textit{local}$ optima -- representations that do not improve by local search algorithms such as gradient-based methods -- remains open. Our work settles the complexity of finding local optima in various contrastive learning problems by proving $\mathsf{PLS}$-hardness in discrete settings (e.g., maximize satisfied triplets) and $\mathsf{CLS}$-hardness in continuous settings (e.g., minimize Triplet Loss), where $\mathsf{PLS}$ (Polynomial Local Search) and $\mathsf{CLS}$ (Continuous Local Search) are well-studied complexity classes capturing local search dynamics in discrete and continuous optimization, respectively. Our results imply that no polynomial time algorithm (local search or otherwise) can find a local optimum for various contrastive learning problems, unless $\mathsf{PLS}\subseteq\mathsf{P}$ (or $\mathsf{CLS}\subseteq \mathsf{P}$ for continuous problems). Even in the unlikely scenario that $\mathsf{PLS}\subseteq\mathsf{P}$ (or $\mathsf{CLS}\subseteq \mathsf{P}$), our reductions imply that there exist instances where local search algorithms need exponential time to reach a local optimum, even for $d=1$ (embeddings on a line).
LGOct 19, 2021
Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky's TheoremClayton Sanford, Vaggos Chatziafratis
Given a target function $f$, how large must a neural network be in order to approximate $f$? Recent works examine this basic question on neural network \textit{expressivity} from the lens of dynamical systems and provide novel ``depth-vs-width'' tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of \textit{periodic} points or \emph{cycles} in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain ``chaotic itineraries'' give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant $L_1$ error). More broadly, we identify a phase transition to the \textit{chaotic regime} that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy.
DSOct 1, 2020
From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical ClusteringInes Chami, Albert Gu, Vaggos Chatziafratis et al.
Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
LGMar 2, 2020
Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical SystemsVaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas
The expressivity of neural networks as a function of their depth, width and type of activation units has been an important question in deep learning theory. Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems, using a generalized notion of fixed points of a continuous map $f$, called periodic points. In this work, we strengthen the connection with dynamical systems and we improve the existing width lower bounds along several aspects. Our first main result is period-specific width lower bounds that hold under the stronger notion of $L^1$-approximation error, instead of the weaker classification error. Our second contribution is that we provide sharper width lower bounds, still yielding meaningful exponential depth-width separations, in regimes where previous results wouldn't apply. A byproduct of our results is that there exists a universal constant characterizing the depth-width trade-offs, as long as $f$ has odd periods. Technically, our results follow by unveiling a tighter connection between the following three quantities of a given function: its period, its Lipschitz constant and the growth rate of the number of oscillations arising under compositions of the function $f$ with itself.
LGDec 9, 2019
Depth-Width Trade-offs for ReLU Networks via Sharkovsky's TheoremVaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas et al.
Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky's work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths. In this work, we point to a new connection between DNNs expressivity and Sharkovsky's Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky's work contain points of period 3 - a period that is special in that it implies chaotic behavior based on the celebrated result by Li-Yorke - we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions.
DSNov 29, 2019
Adversarially Robust Low Dimensional RepresentationsPranjal Awasthi, Vaggos Chatziafratis, Xue Chen et al.
Many machine learning systems are vulnerable to small perturbations made to inputs either at test time or at training time. This has received much recent interest on the empirical front due to applications where reliability and security are critical. However, theoretical understanding of algorithms that are robust to adversarial perturbations is limited. In this work we focus on Principal Component Analysis (PCA), a ubiquitous algorithmic primitive in machine learning. We formulate a natural robust variant of PCA where the goal is to find a low dimensional subspace to represent the given data with minimum projection error, that is in addition robust to small perturbations measured in $\ell_q$ norm (say $q=\infty$). Unlike PCA which is solvable in polynomial time, our formulation is computationally intractable to optimize as it captures a variant of the well-studied sparse PCA objective as a special case. We show the following results: -Polynomial time algorithm that is constant factor competitive in the worst-case with respect to the best subspace, in terms of the projection error and the robustness criterion. -We show that our algorithmic techniques can also be made robust to adversarial training-time perturbations, in addition to yielding representations that are robust to adversarial perturbations at test time. Specifically, we design algorithms for a strong notion of training-time perturbations, where every point is adversarially perturbed up to a specified amount. -We illustrate the broad applicability of our algorithmic techniques in addressing robustness to adversarial perturbations, both at training time and test time. In particular, our adversarially robust PCA primitive leads to computationally efficient and robust algorithms for both unsupervised and supervised learning problems such as clustering and learning adversarially robust classifiers.
DSAug 7, 2018
Hierarchical Clustering better than Average-LinkageMoses Charikar, Vaggos Chatziafratis, Rad Niazadeh
Hierarchical Clustering (HC) is a widely studied problem in exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. In this paper we focus on two objectives, introduced recently to give insight into the performance of average-linkage clustering: a similarity based HC objective proposed by [Moseley and Wang, 2017] and a dissimilarity based HC objective proposed by [Cohen-Addad et al., 2018]. In both cases, we present tight counterexamples showing that average-linkage cannot obtain better than 1/3 and 2/3 approximations respectively (in the worst-case), settling an open question raised in [Moseley and Wang, 2017]. This matches the approximation ratio of a random solution, raising a natural question: can we beat average-linkage for these objectives? We answer this in the affirmative, giving two new algorithms based on semidefinite programming with provably better guarantees.
LGJul 3, 2018
On the Computational Power of Online Gradient DescentVaggos Chatziafratis, Tim Roughgarden, Joshua R. Wang
We prove that the evolution of weight vectors in online gradient descent can encode arbitrary polynomial-space computations, even in very simple learning settings. Our results imply that, under weak complexity-theoretic assumptions, it is impossible to reason efficiently about the fine-grained behavior of online gradient descent.
DSMay 24, 2018
Hierarchical Clustering with Structural ConstraintsVaggos Chatziafratis, Rad Niazadeh, Moses Charikar
Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by the set of features available to the algorithm. This gives rise to the problem of "hierarchical clustering with structural constraints". Structural constraints pose major challenges for bottom-up approaches like average/single linkage and even though they can be naturally incorporated into top-down divisive algorithms, no formal guarantees exist on the quality of their output. In this paper, we provide provable approximation guarantees for two simple top-down algorithms, using a recently introduced optimization viewpoint of hierarchical clustering with pairwise similarity information [Dasgupta, 2016]. We show how to find good solutions even in the presence of conflicting prior information, by formulating a constraint-based regularization of the objective. We further explore a variation of this objective for dissimilarity information [Cohen-Addad et al., 2018] and improve upon current techniques. Finally, we demonstrate our approach on a real dataset for the taxonomy application.