DSMay 22
Reducing the Randomness in Partition Oracles for Bounded Degree Minor-Free GraphsAkash Kumar, Abhiruk Lahiri, C. Seshadhri
Consider a bounded-degree graph $G$ that belongs to a minor-closed family (such as planar graphs). Such a graph has a hyperfinite decomposition, wherein, for a sufficiently small $\varepsilon > 0$, one can remove $\varepsilon dn$ edges to obtain connected components of size independent of $n$. (As usual, $n$ is the number of vertices and $d$ is the degree bound.) In a seminal result, Hassidim-Kelner-Nguyen-Onak (FOCS 2009) introduced the partition oracle, a procedure that provides local access to a hyperfinite decomposition. The partition oracle computes the component containing an input vertex $v$ with query complexity (to $G$) independent of $n$. Remarkably, this is done without any preprocessing on $G$. The coordination is done purely through a shared random seed. Despite a line of work on optimizing the query complexity of partition oracles, there were no attempts to bound the size of the random seed. All existing partition oracles use a random seed of size $Ω(n)$, which technically implies a linear setup time. Any blackbox derandomization would likely need $Ω(\log^2n)$ uniform random bits. A natural question is whether the random seed can also have length independent of $n$. We prove the $poly(d\varepsilon^{-1})$-query partition oracles of Kumar-Seshadhri-Stolman can be implemented with a random seed of $poly(d\varepsilon^{-1}) \cdot \log n$ length. To get a deeper understanding on the randomness complexity, we consider a more general model where the vertex labels come from the universe $[N]$, where $N \geq n$. In this setting, we prove that any partition oracle even for cycles requires $ω_N(1)$ random bits.
NIApr 16
Expanding into Reality: Random Graphs for Datacenter NetworksGiacomo Bernardi, Ratul Mahajan, C. Seshadhri et al.
We design and deploy at Amazon the first production datacenter fabrics based on random graphs. While the cost and fault-tolerance benefits of such topologies have been long known, their practical realization has been hampered by a lack of scalable routing and cabling approaches. Our design, called RNG, has a new distributed routing protocol that exploits the properties of random graphs to find a large number of edge disjoint paths between endpoint pairs. A novel passive optical device that internally shuffles cable endpoints makes Amazon's cabling complexity similar to that of fat trees. We show that RNG fabrics match or exceed the performance of fat trees for a range of traffic patterns, despite being up to 45% cheaper. At Amazon, we made RNG the default datacenter fabric for most workloads.
STFeb 22, 2025
Monotonicity Testing of High-Dimensional Distributions with Subcube ConditioningDeeparnab Chakrabarty, Xi Chen, Simeon Ristic et al.
We study monotonicity testing of high-dimensional distributions on $\{-1,1\}^n$ in the model of subcube conditioning, suggested and studied by Canonne, Ron, and Servedio~\cite{CRS15} and Bhattacharyya and Chakraborty~\cite{BC18}. Previous work shows that the \emph{sample complexity} of monotonicity testing must be exponential in $n$ (Rubinfeld, Vasilian~\cite{RV20}, and Aliakbarpour, Gouleakis, Peebles, Rubinfeld, Yodpinyanee~\cite{AGPRY19}). We show that the subcube \emph{query complexity} is $\tildeΘ(n/\varepsilon^2)$, by proving nearly matching upper and lower bounds. Our work is the first to use directed isoperimetric inequalities (developed for function monotonicity testing) for analyzing a distribution testing algorithm. Along the way, we generalize an inequality of Khot, Minzer, and Safra~\cite{KMS18} to real-valued functions on $\{-1,1\}^n$. We also study uniformity testing of distributions that are promised to be monotone, a problem introduced by Rubinfeld, Servedio~\cite{RS09} , using subcube conditioning. We show that the query complexity is $\tildeΘ(\sqrt{n}/\varepsilon^2)$. Our work proves the lower bound, which matches (up to poly-logarithmic factors) the uniformity testing upper bound for general distributions (Canonne, Chen, Kamath, Levi, Waingarten~\cite{CCKLW21}). Hence, we show that monotonicity does not help, beyond logarithmic factors, in testing uniformity of distributions with subcube conditional queries.
SIJan 20, 2022
Classic Graph Structural Features Outperform Factorization-Based Graph Embedding Methods on Community LabelingAndrew Stolman, Caleb Levy, C. Seshadhri et al.
Graph representation learning (also called graph embeddings) is a popular technique for incorporating network structure into machine learning models. Unsupervised graph embedding methods aim to capture graph structure by learning a low-dimensional vector representation (the embedding) for each node. Despite the widespread use of these embeddings for a variety of downstream transductive machine learning tasks, there is little principled analysis of the effectiveness of this approach for common tasks. In this work, we provide an empirical and theoretical analysis for the performance of a class of embeddings on the common task of pairwise community labeling. This is a binary variant of the classic community detection problem, which seeks to build a classifier to determine whether a pair of vertices participate in a community. In line with our goal of foundational understanding, we focus on a popular class of unsupervised embedding techniques that learn low rank factorizations of a vertex proximity matrix (this class includes methods like GraRep, DeepWalk, node2vec, NetMF). We perform detailed empirical analysis for community labeling over a variety of real and synthetic graphs with ground truth. In all cases we studied, the models trained from embedding features perform poorly on community labeling. In constrast, a simple logistic model with classic graph structural features handily outperforms the embedding models. For a more principled understanding, we provide a theoretical analysis for the (in)effectiveness of these embeddings in capturing the community structure. We formally prove that popular low-dimensional factorization methods either cannot produce community structure, or can only produce ``unstable" communities. These communities are inherently unstable under small perturbations.
AIApr 19, 2021
Randomized Algorithms for Scientific Computing (RASC)Aydin Buluc, Tamara G. Kolda, Stefan M. Wild et al.
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
LGMar 27, 2020
The impossibility of low rank representations for triangle-rich complex networksC. Seshadhri, Aneesh Sharma, Andrew Stolman et al.
The study of complex networks is a significant development in modern science, and has enriched the social sciences, biology, physics, and computer science. Models and algorithms for such networks are pervasive in our society, and impact human behavior via social networks, search engines, and recommender systems to name a few. A widely used algorithmic technique for modeling such complex networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to the common view, we argue that such graph embeddings do not}capture salient properties of complex networks. The two properties we focus on are low degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. We mathematically prove that any embedding (that uses dot products to measure similarity) that can successfully create these two properties must have rank nearly linear in the number of vertices. Among other implications, this establishes that popular embedding techniques such as Singular Value Decomposition and node2vec fail to capture significant structural aspects of real-world complex networks. Furthermore, we empirically study a number of different embedding techniques based on dot product, and show that they all fail to capture the triangle structure.