LGFeb 14, 2020
Generalization and Representational Limits of Graph Neural NetworksVikas K. Garg, Stefanie Jegelka, Tommi Jaakkola
We address two fundamental questions about graph neural networks (GNNs). First, we prove that several important graph properties cannot be computed by GNNs that rely entirely on local information. Such GNNs include the standard message passing models, and more powerful spatial variants that exploit local graph structure (e.g., via relative orientation of messages, or local port ordering) to distinguish neighbors of each node. Our treatment includes a novel graph-theoretic formalism. Second, we provide the first data dependent generalization bounds for message passing GNNs. This analysis explicitly accounts for the local permutation invariance of GNNs. Our bounds are much tighter than existing VC-dimension based guarantees for GNNs, and are comparable to Rademacher bounds for recurrent neural networks.
LGFeb 13, 2020
Learn to Expect the Unexpected: Probably Approximately Correct Domain GeneralizationVikas K. Garg, Adam Kalai, Katrina Ligett et al.
Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm consists of multiple datasets each from a single domain drawn in turn from the meta-distribution. We study this model in three different problem settings---a multi-domain Massart noise setting, a decision tree multi-dataset setting, and a feature selection setting, and find that computationally efficient, polynomial-sample domain generalization is possible in each. Experiments demonstrate that our feature selection algorithm indeed ignores spurious correlations and improves generalization.
LGAug 27, 2019
Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical StructureVikas K. Garg, Inderjit S. Dhillon, Hsiang-Fu Yu
The architecture of Transformer is based entirely on self-attention, and has been shown to outperform models that employ recurrence on sequence transduction tasks such as machine translation. The superior performance of Transformer has been attributed to propagating signals over shorter distances, between positions in the input and the output, compared to the recurrent architectures. We establish connections between the dynamics in Transformer and recurrent networks to argue that several factors including gradient flow along an ensemble of multiple weakly dependent paths play a paramount role in the success of Transformer. We then leverage the dynamics to introduce {\em Multiresolution Transformer Networks} as the first architecture that exploits hierarchical structure in data via self-attention. Our models significantly outperform state-of-the-art recurrent and hierarchical recurrent models on two real-world datasets for query suggestion, namely, \aol and \amazon. In particular, on AOL data, our model registers at least 20\% improvement on each precision score, and over 25\% improvement on the BLEU score with respect to the best performing recurrent model. We thus provide strong evidence that recurrence is not essential for modeling hierarchical structure.
LGMay 29, 2019
Strategic Prediction with Latent Aggregative GamesVikas K. Garg, Tommi Jaakkola
We introduce a new class of context dependent, incomplete information games to serve as structured prediction models for settings with significant strategic interactions. Our games map the input context to outcomes by first condensing the input into private player types that specify the utilities, weighted interactions, as well as the initial strategies for the players. The game is played over multiple rounds where players respond to weighted aggregates of their neighbors' strategies. The predicted output from the model is a mixed strategy profile (a near-Nash equilibrium) and each observation is thought of as a sample from this strategy profile. We introduce two new aggregator paradigms with provably convergent game dynamics, and characterize the conditions under which our games are identifiable from data. Our games can be parameterized in a transferable manner so that the sets of players can change from one game to another. We demonstrate empirically that our games as models can recover meaningful strategic interactions from real voting data.
LGMay 29, 2019
Solving graph compression via optimal transportVikas K. Garg, Tommi Jaakkola
We propose a new approach to graph compression by appeal to optimal transport. The transport problem is seeded with prior information about node importance, attributes, and edges in the graph. The transport formulation can be setup for either directed or undirected graphs, and its dual characterization is cast in terms of distributions over the nodes. The compression pertains to the support of node distributions and makes the problem challenging to solve directly. To this end, we introduce Boolean relaxations and specify conditions under which these relaxations are exact. The relaxations admit algorithms with provably fast convergence. Moreover, we provide an exact $O(d \log d)$ algorithm for the subproblem of projecting a $d$-dimensional vector to transformed simplex constraints. Our method outperforms state-of-the-art compression methods on graph classification.
LGOct 16, 2018
Online Markov Decoding: Lower Bounds and Near-Optimal Approximation AlgorithmsVikas K. Garg, Tamar Pichkhadze
We resolve the fundamental problem of online decoding with general $n^{th}$ order ergodic Markov chain models. Specifically, we provide deterministic and randomized algorithms whose performance is close to that of the optimal offline algorithm even when latency is small. Our algorithms admit efficient implementation via dynamic programs, and readily extend to (adversarial) non-stationary or time-varying settings. We also establish lower bounds for online methods under latency constraints in both deterministic and randomized settings, and show that no online algorithm can perform significantly better than our algorithms. Empirically, just with latency one, our algorithm outperforms the online step algorithm by over 30\% in terms of decoding agreement with the optimal algorithm on genome sequence data.
LGMar 6, 2018
Learning SMaLL PredictorsVikas K. Garg, Ofer Dekel, Lin Xiao
We present a new machine learning technique for training small resource-constrained predictors. Our algorithm, the Sparse Multiprototype Linear Learner (SMaLL), is inspired by the classic machine learning problem of learning $k$-DNF Boolean formulae. We present a formal derivation of our algorithm and demonstrate the benefits of our approach with a detailed empirical study.
AISep 14, 2017
Supervising Unsupervised LearningVikas K. Garg, Adam Kalai
We introduce a framework to leverage knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via simple agnostic bounds on unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent the Kleinberg's impossibility result. Experimental results across hundreds of problems demonstrate improved performance on unsupervised data with simple algorithms, despite the fact that our problems come from heterogeneous domains. Additionally, our framework lets us leverage deep networks to learn common features from many such small datasets, and perform zero shot learning.
LGDec 29, 2016
Meta-Unsupervised-Learning: A supervised approach to unsupervised learningVikas K. Garg, Adam Tauman Kalai
We introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on several unsupervised problems such as (a) clustering, (b) outlier detection, and (c) similarity prediction under a common umbrella of meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to establish the theoretical foundations of our framework, and show that our framing of meta-clustering circumvents Kleinberg's impossibility theorem for clustering.
LGJun 25, 2015
CRAFT: ClusteR-specific Assorted Feature selecTionVikas K. Garg, Cynthia Rudin, Tommi Jaakkola
We present a framework for clustering with cluster-specific feature selection. The framework, CRAFT, is derived from asymptotic log posterior formulations of nonparametric MAP-based clustering models. CRAFT handles assorted data, i.e., both numeric and categorical data, and the underlying objective functions are intuitively appealing. The resulting algorithm is simple to implement and scales nicely, requires minimal parameter tuning, obviates the need to specify the number of clusters a priori, and compares favorably with other methods on real datasets.
CVApr 19, 2015
DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural NetsSukrit Shankar, Vikas K. Garg, Roberto Cipolla
Most of the approaches for discovering visual attributes in images demand significant supervision, which is cumbersome to obtain. In this paper, we aim to discover visual attributes in a weakly supervised setting that is commonly encountered with contemporary image search engines. Deep Convolutional Neural Networks (CNNs) have enjoyed remarkable success in vision applications recently. However, in a weakly supervised scenario, widely used CNN training procedures do not learn a robust model for predicting multiple attribute labels simultaneously. The primary reason is that the attributes highly co-occur within the training data. To ameliorate this limitation, we propose Deep-Carving, a novel training procedure with CNNs, that helps the net efficiently carve itself for the task of multiple attribute prediction. During training, the responses of the feature maps are exploited in an ingenious way to provide the net with multiple pseudo-labels (for training images) for subsequent iterations. The process is repeated periodically after a fixed number of iterations, and enables the net carve itself iteratively for efficiently disentangling features. Additionally, we contribute a noun-adjective pairing inspired Natural Scenes Attributes Dataset to the research community, CAMIT - NSAD, containing a number of co-occurring attributes within a noun category. We describe, in detail, salient aspects of this dataset. Our experiments on CAMIT-NSAD and the SUN Attributes Dataset, with weak supervision, clearly demonstrate that the Deep-Carved CNNs consistently achieve considerable improvement in the precision of attribute prediction over popular baseline methods.