DCJan 6, 2013
Direct QR factorizations for tall-and-skinny matrices in MapReduce architecturesAustin R. Benson, David F. Gleich, James Demmel
The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a numerically stable, efficient, communication-avoiding algorithm for computing the QR factorization. It has been used in traditional high performance computing and grid computing environments. For MapReduce environments, existing methods to compute the QR decomposition use a numerically unstable approach that relies on indirectly computing the Q factor. In the best case, these methods require only two passes over the data. In this paper, we describe how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data. We can compute the SVD with only a small change and no difference in performance. We present a performance comparison between our new direct TSQR method, a standard unstable implementation for MapReduce (Cholesky QR), and the classic stable algorithm implemented for MapReduce (Householder QR). We find that our new stable method has a large performance advantage over the Householder QR method. This holds both in a theoretical performance model as well as in an actual implementation.
NADec 26, 2016
The Spacey Random Walk: a Stochastic Process for Higher-order DataAustin R. Benson, David F. Gleich, Lek-Heng Lim
Random walks are a fundamental model in applied mathematics and are a common example of a Markov chain. The limiting stationary distribution of the Markov chain represents the fraction of the time spent in each state during the stochastic process. A standard way to compute this distribution for a random walk on a finite set of states is to compute the Perron vector of the associated transition matrix. There are algebraic analogues of this Perron vector in terms of transition probability tensors of higher-order Markov chains. These vectors are nonnegative, have dimension equal to the dimension of the state space, and sum to one and are derived by making an algebraic substitution in the equation for the joint-stationary distribution of a higher-order Markov chains. Here, we present the spacey random walk, a non-Markovian stochastic process whose stationary distribution is given by the tensor eigenvector. The process itself is a vertex-reinforced random walk, and its discrete dynamics are related to a continuous dynamical system. We analyze the convergence properties of these dynamics and discuss numerical methods for computing the stationary distribution. Finally, we provide several applications of the spacey random walk model in population genetics, ranking, and clustering data, and we use the process to analyze taxi trajectory data in New York. This example shows definite non-Markovian structure.
DCSep 9, 2014
A Framework for Practical Parallel Fast Matrix MultiplicationAustin R. Benson, Grey Ballard
Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes. Furthermore, we show that the best choice of fast algorithm depends not only on the size of the matrices but also the shape. We develop a code generation tool to automatically implement multiple sequential and shared-memory parallel variants of each fast algorithm, including our novel parallelization scheme. This allows us to rapidly benchmark over 20 fast algorithms on several problem sizes. Furthermore, we discuss a number of practical implementation issues for these algorithms on shared-memory machines that can direct further research on making fast algorithms practical.
SIMay 23, 2019
Network Density of StatesKun Dong, Austin R. Benson, David Bindel
Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues - the spectral density - is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone.
LGJul 22, 2022
Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference PerspectiveRongzhe Wei, Haoteng Yin, Junteng Jia et al.
Graph neural networks (GNNs) have shown superiority in many prediction tasks over graphs due to their impressive capability of capturing nonlinear relations in graph-structured data. However, for node classification tasks, often, only marginal improvement of GNNs over their linear counterparts has been observed. Previous works provide very few understandings of this phenomenon. In this work, we resort to Bayesian learning to deeply investigate the functions of non-linearity in GNNs for node classification tasks. Given a graph generated from the statistical model CSBM, we observe that the max-a-posterior estimation of a node label given its own and neighbors' attributes consists of two types of non-linearity, a possibly non-linear transformation of node attributes and a ReLU-activated feature aggregation from neighbors. The latter surprisingly matches the type of non-linearity used in many GNN models. By further imposing Gaussian assumption on node attributes, we prove that the superiority of those ReLU activations is only significant when the node attributes are far more informative than the graph structure, which nicely matches many previous empirical observations. A similar argument can be achieved when there is a distribution shift of node attributes between the training and testing datasets. Finally, we verify our theory on both synthetic and real-world networks.
LGMay 23, 2022
Graph-Based Methods for Discrete ChoiceKiran Tomlinson, Austin R. Benson
Choices made by individuals have widespread impacts--for instance, people choose between political candidates to vote for, between social media posts to share, and between brands to purchase--moreover, data on these choices are increasingly abundant. Discrete choice models are a key tool for learning individual preferences from such data. Additionally, social factors like conformity and contagion influence individual choice. Traditional methods for incorporating these factors into choice models do not account for the entire social network and require hand-crafted features. To overcome these limitations, we use graph learning to study choice in networked contexts. We identify three ways in which graph learning techniques can be used for discrete choice: learning chooser representations, regularizing choice model parameters, and directly constructing predictions from a network. We design methods in each category and test them on real-world choice datasets, including county-level 2016 US election results and Android app installation and usage data. We show that incorporating social network structure can improve the predictions of the standard econometric choice model, the multinomial logit. We provide evidence that app installations are influenced by social context, but we find no such effect on app usage among the same participants, which instead is habit-driven. In the election data, we highlight the additional insights a discrete choice framework provides over classification or regression, the typical approaches. On synthetic data, we demonstrate the sample complexity benefit of using social information in choice models.
SIJun 30, 2021Code
Edge Proposal Sets for Link PredictionAbhay Singh, Qian Huang, Sijia Linda Huang et al.
Graphs are a common model for complex relational data such as social networks and protein interactions, and such data can evolve over time (e.g., new friendships) and be noisy (e.g., unmeasured interactions). Link prediction aims to predict future edges or infer missing edges in the graph, and has diverse applications in recommender systems, experimental design, and complex systems. Even though link prediction algorithms strongly depend on the set of edges in the graph, existing approaches typically do not modify the graph topology to improve performance. Here, we demonstrate how simply adding a set of edges, which we call a \emph{proposal set}, to the graph as a pre-processing step can improve the performance of several link prediction algorithms. The underlying idea is that if the edges in the proposal set generally align with the structure of the graph, link prediction algorithms are further guided towards predicting the right edges; in other words, adding a proposal set of edges is a signal-boosting pre-processing step. We show how to use existing link prediction algorithms to generate effective proposal sets and evaluate this approach on various synthetic and empirical datasets. We find that proposal sets meaningfully improve the accuracy of link prediction algorithms based on both neighborhood heuristics and graph neural networks. Code is available at \url{https://github.com/CUAI/Edge-Proposal-Sets}.
LGOct 27, 2020Code
Combining Label Propagation and Simple Models Out-performs Graph Neural NetworksQian Huang, Horace He, Abhay Singh et al.
Graph Neural Networks (GNNs) are the predominant technique for learning over graphs. However, there is relatively little understanding of why GNNs are successful in practice and whether they are necessary for good performance. Here, we show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs by combining shallow models that ignore the graph structure with two simple post-processing steps that exploit correlation in the label structure: (i) an "error correlation" that spreads residual errors in training data to correct errors in test data and (ii) a "prediction correlation" that smooths the predictions on the test data. We call this overall procedure Correct and Smooth (C&S), and the post-processing steps are implemented via simple modifications to standard label propagation techniques from early graph-based semi-supervised learning methods. Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks, with just a small fraction of the parameters and orders of magnitude faster runtime. For instance, we exceed the best known GNN performance on the OGB-Products dataset with 137 times fewer parameters and greater than 100 times less training time. The performance of our methods highlights how directly incorporating label information into the learning algorithm (as was done in traditional techniques) yields easy and substantial performance gains. We can also incorporate our techniques into big GNN models, providing modest gains. Our code for the OGB results is at https://github.com/Chillee/CorrectAndSmooth.
LGOct 28, 2021
Approximate Decomposable Submodular Function Minimization for Cardinality-Based ComponentsNate Veldt, Austin R. Benson, Jon Kleinberg
Minimizing a sum of simple submodular functions of limited support is a special case of general submodular function minimization that has seen numerous applications in machine learning. We develop fast techniques for instances where components in the sum are cardinality-based, meaning they depend only on the size of the input set. This variant is one of the most widely applied in practice, encompassing, e.g., common energy functions arising in image segmentation and recent generalized hypergraph cut functions. We develop the first approximation algorithms for this problem, where the approximations can be quickly computed via reduction to a sparse graph cut problem, with graph sparsity controlled by the desired approximation factor. Our method relies on a new connection between sparse graph reduction techniques and piecewise linear approximations to concave functions. Our sparse reduction technique leads to significant improvements in theoretical runtimes, as well as substantial practical gains in problems ranging from benchmark image segmentation tasks to hypergraph clustering problems.
LGJun 6, 2021
Graph Belief Propagation NetworksJunteng Jia, Cenk Baykal, Vamsi K. Potluru et al.
With the wide-spread availability of complex relational data, semi-supervised node classification in graphs has become a central machine learning problem. Graph neural networks are a recent class of easy-to-train and accurate methods for this problem that map the features in the neighborhood of a node to its label, but they ignore label correlation during inference and their predictions are difficult to interpret. On the other hand, collective classification is a traditional approach based on interpretable graphical models that explicitly model label correlations. Here, we introduce a model that combines the advantages of these two approaches, where we compute the marginal probabilities in a conditional random field, similar to collective classification, and the potentials in the random field are learned through end-to-end training, akin to graph neural networks. In our model, potentials on each node only depend on that node's features, and edge potentials are learned via a coupling matrix. This structure enables simple training with interpretable parameters, scales to large networks, naturally incorporates training labels at inference, and is often more accurate than related approaches. Our approach can be viewed as either an interpretable message-passing graph neural network or a collective classification method with higher capacity and modernized training.
DSJun 2, 2021
The Generalized Mean Densest Subgraph ProblemNate Veldt, Austin R. Benson, Jon Kleinberg
Finding dense subgraphs of a large graph is a standard problem in graph mining that has been studied extensively both for its theoretical richness and its many practical applications. In this paper we introduce a new family of dense subgraph objectives, parameterized by a single parameter $p$, based on computing generalized means of degree sequences of a subgraph. Our objective captures both the standard densest subgraph problem and the maximum $k$-core as special cases, and provides a way to interpolate between and extrapolate beyond these two objectives when searching for other notions of dense subgraphs. In terms of algorithmic contributions, we first show that our objective can be minimized in polynomial time for all $p \geq 1$ using repeated submodular minimization. A major contribution of our work is analyzing the performance of different types of peeling algorithms for dense subgraphs both in theory and practice. We prove that the standard peeling algorithm can perform arbitrarily poorly on our generalized objective, but we then design a more sophisticated peeling method which for $p \geq 1$ has an approximation guarantee that is always at least $1/2$ and converges to 1 as $p \rightarrow \infty$. In practice, we show that this algorithm obtains extremely good approximations to the optimal solution, scales to large graphs, and highlights a range of different meaningful notions of density on graphs coming from numerous domains. Furthermore, it is typically able to approximate the densest subgraph problem better than the standard peeling algorithm, by better accounting for how the removal of one node affects other nodes in its neighborhood.
LGMay 17, 2021
Choice Set Confounding in Discrete ChoiceKiran Tomlinson, Johan Ugander, Austin R. Benson
Standard methods in preference learning involve estimating the parameters of discrete choice models from data of selections (choices) made by individuals from a discrete set of alternatives (the choice set). While there are many models for individual preferences, existing learning methods overlook how choice set assignment affects the data. Often, the choice set itself is influenced by an individual's preferences; for instance, a consumer choosing a product from an online retailer is often presented with options from a recommender system that depend on information about the consumer's preferences. Ignoring these assignment mechanisms can mislead choice models into making biased estimates of preferences, a phenomenon that we call choice set confounding; we demonstrate the presence of such confounding in widely-used choice datasets. To address this issue, we adapt methods from causal inference to the discrete choice setting. We use covariates of the chooser for inverse probability weighting and/or regression controls, accurately recovering individual preferences in the presence of choice set confounding under certain assumptions. When such covariates are unavailable or inadequate, we develop methods that take advantage of structured choice set assignment to improve prediction. We demonstrate the effectiveness of our methods on real-world choice data, showing, for example, that accounting for choice set confounding makes choices observed in hotel booking and commute transportation more consistent with rational utility-maximization.
LGMar 27, 2021
A nonlinear diffusion method for semi-supervised learning on hypergraphsFrancesco Tudisco, Konstantin Prokopchik, Austin R. Benson
Hypergraphs are a common model for multiway relationships in data, and hypergraph semi-supervised learning is the problem of assigning labels to all nodes in a hypergraph, given labels on just a few nodes. Diffusions and label spreading are classical techniques for semi-supervised learning in the graph setting, and there are some standard ways to extend them to hypergraphs. However, these methods are linear models, and do not offer an obvious way of incorporating node features for making predictions. Here, we develop a nonlinear diffusion process on hypergraphs that spreads both features and labels following the hypergraph structure, which can be interpreted as a hypergraph equilibrium network. Even though the process is nonlinear, we show global convergence to a unique limiting point for a broad class of nonlinearities, which is the global optimum of a interpretable, regularized semi-supervised learning loss function. The limiting point serves as a node embedding from which we make predictions with a linear model. Our approach is much more accurate than several hypergraph neural networks, and also takes less time to train.
SIJan 24, 2021
Generative hypergraph clustering: from blockmodels to modularityPhilip S. Chodrow, Nate Veldt, Austin R. Benson
Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. Many graph algorithms for this task are based on variants of the stochastic blockmodel, a random graph with flexible cluster structure. However, there are few models and algorithms for hypergraph clustering. Here, we propose a Poisson degree-corrected hypergraph stochastic blockmodel (DCHSBM), a generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum-likelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvain-type algorithm for this objective, as well as a a faster, specialized "All-Or-Nothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edge-size parameters. We show that AON hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. We use our generative model to analyze different patterns of higher-order structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in co-purchasing behavior, and hotel locations from web browsing sessions, finding interpretable higher-order structure. We then study the behavior of our AON hypergraph Louvain algorithm, finding that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higher-order structure.
LGJan 19, 2021
A Unifying Generative Model for Graph Learning Algorithms: Label Propagation, Graph Convolutions, and CombinationsJunteng Jia, Austin R. Benson
Semi-supervised learning on graphs is a widely applicable problem in network science and machine learning. Two standard algorithms -- label propagation and graph neural networks -- both operate by repeatedly passing information along edges, the former by passing labels and the latter by passing node features, modulated by neural networks. These two types of algorithms have largely developed separately, and there is little understanding about the structure of network data that would make one of these approaches work particularly well compared to the other or when the approaches can be meaningfully combined. Here, we develop a Markov random field model for the data generation process of node attributes, based on correlations of attributes on and between vertices, that motivates and unifies these algorithmic approaches. We show that label propagation, a linearized graph convolutional network, and their combination can all be derived as conditional expectations under our model, when conditioning on different attributes. In addition, the data model highlights deficiencies in existing graph neural networks (while producing new algorithmic solutions), serves as a rigorous statistical framework for understanding graph learning issues such as over-smoothing, creates a testbed for evaluating inductive learning performance, and provides a way to sample graphs attributes that resemble empirical data. We also find that a new algorithm derived from our data generation model, which we call a Linear Graph Convolution, performs extremely well in practice on empirical data, and provide theoretical justification for why this is the case.
NAOct 29, 2020
Over-parametrized neural networks as under-determined linear systemsAustin R. Benson, Anil Damle, Alex Townsend
We draw connections between simple neural networks and under-determined linear systems to comprehensively explore several interesting theoretical questions in the study of neural networks. First, we emphatically show that it is unsurprising such networks can achieve zero training loss. More specifically, we provide lower bounds on the width of a single hidden layer neural network such that only training the last linear layer suffices to reach zero training loss. Our lower bounds grow more slowly with data set size than existing work that trains the hidden layer weights. Second, we show that kernels typically associated with the ReLU activation function have fundamental flaws -- there are simple data sets where it is impossible for widely studied bias-free models to achieve zero training loss irrespective of how the parameters are chosen or trained. Lastly, our analysis of gradient descent clearly illustrates how spectral properties of certain matrices impact both the early iteration and long-term training behavior. We propose new activation functions that avoid the pitfalls of ReLU in that they admit zero training loss solutions for any set of distinct data points and experimentally exhibit favorable spectral properties.
LGSep 7, 2020
Learning Interpretable Feature Context Effects in Discrete ChoiceKiran Tomlinson, Austin R. Benson
The outcomes of elections, product sales, and the structure of social connections are all determined by the choices individuals make when presented with a set of options, so understanding the factors that contribute to choice is crucial. Of particular interest are context effects, which occur when the set of available options influences a chooser's relative preferences, as they violate traditional rationality assumptions yet are widespread in practice. However, identifying these effects from observed choices is challenging, often requiring foreknowledge of the effect to be measured. In contrast, we provide a method for the automatic discovery of a broad class of context effects from observed choice data. Our models are easier to train and more flexible than existing models and also yield intuitive, interpretable, and statistically testable context effects. Using our models, we identify new context effects in widely used choice datasets and provide the first analysis of choice set context effects in social network growth.
MLSep 5, 2020
Communication-efficient distributed eigenspace estimationVasileios Charisopoulos, Austin R. Benson, Anil Damle
Distributed computing is a standard way to scale up machine learning and data science algorithms to process large amounts of data. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Rather than distribute the computation of existing algorithms, a common practice for avoiding communication is to compute local solutions or parameter estimates on each machine and then combine the results; in many convex optimization problems, even simple averaging of local solutions can work well. However, these schemes do not work when the local solutions are not unique. Spectral methods are a collection of such problems, where solutions are orthonormal bases of the leading invariant subspace of an associated data matrix, which are only unique up to rotation and reflections. Here, we develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix. Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution, and only requires a single round of communication. For the important case of principal component analysis (PCA), we show that our algorithm achieves a similar error rate to that of a centralized estimator. We present numerical experiments demonstrating the efficacy of our proposed algorithm for distributed PCA, as well as other problems where solutions exhibit rotational symmetry, such as node embeddings for graph data and spectral initialization for quadratic sensing.
SIJun 15, 2020
Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius PlatformDerek Lim, Austin R. Benson
Many platforms collect crowdsourced information primarily from volunteers. As this type of knowledge curation has become widespread, contribution formats vary substantially and are driven by diverse processes across differing platforms. Thus, models for one platform are not necessarily applicable to others. Here, we study the temporal dynamics of Genius, a platform primarily designed for user-contributed annotations of song lyrics. A unique aspect of Genius is that the annotations are extremely local -- an annotated lyric may just be a few lines of a song -- but also highly related, e.g., by song, album, artist, or genre. We analyze several dynamical processes associated with lyric annotations and their edits, which differ substantially from models for other platforms. For example, expertise on song annotations follows a "U shape" where experts are both early and late contributors with non-experts contributing intermediately; we develop a user utility model that captures such behavior. We also find several contribution traits appearing early in a user's lifespan of contributions that distinguish (eventual) experts from non-experts. Combining our findings, we develop a model for early prediction of user expertise.
SIJun 10, 2020
Hypergraph Clustering for Finding Diverse and Experienced GroupsIlya Amburg, Nate Veldt, Austin R. Benson
When forming a team or group of individuals, we often seek a balance of expertise in a particular task while at the same time maintaining diversity of skills within each group. Here, we view the problem of finding diverse and experienced groups as clustering in hypergraphs with multiple edge types. The input data is a hypergraph with multiple hyperedge types -- representing information about past experiences of groups of individuals -- and the output is groups of nodes. In contrast to related problems on fair or balanced clustering, we model diversity in terms of variety of past experience (instead of, e.g., protected attributes), with a goal of forming groups that have both experience and diversity with respect to participation in edge types. In other words, both diversity and experience are measured from the types of the hyperedges. Our clustering model is based on a regularized version of an edge-based hypergraph clustering objective, and we also show how naive objectives actually have no diversity-experience tradeoff. Although our objective function is NP-hard to optimize, we design an efficient 2-approximation algorithm and also show how to compute bounds for the regularization hyperparameter that lead to meaningful diversity-experience tradeoffs. We demonstrate an application of this framework in online review platforms, where the goal is to curate sets of user reviews for a product type. In this context, "experience" corresponds to users familiar with the type of product, and "diversity" to users that have reviewed related products.
LGJun 8, 2020
Nonlinear Higher-Order Label SpreadingFrancesco Tudisco, Austin R. Benson, Konstantin Prokopchik
Label spreading is a general technique for semi-supervised learning with point cloud or network data, which can be interpreted as a diffusion of labels on a graph. While there are many variants of label spreading, nearly all of them are linear models, where the incoming information to a node is a weighted sum of information from neighboring nodes. Here, we add nonlinearity to label spreading through nonlinear functions of higher-order structure in the graph, namely triangles in the graph. For a broad class of nonlinear functions, we prove convergence of our nonlinear higher-order label spreading algorithm to the global solution of a constrained semi-supervised loss function. We demonstrate the efficiency and efficacy of our approach on a variety of point cloud and network datasets, where the nonlinear higher-order model compares favorably to classical label spreading, as well as hypergraph models and graph neural networks.
SIMar 7, 2020
Frozen Binomials on the Web: Word Ordering and Language Conventions in Online TextKatherine Van Koevering, Austin R. Benson, Jon Kleinberg
There is inherent information captured in the order in which we write words in a list. The orderings of binomials --- lists of two words separated by `and' or `or' --- has been studied for more than a century. These binomials are common across many areas of speech, in both formal and informal text. In the last century, numerous explanations have been given to describe what order people use for these binomials, from differences in semantics to differences in phonology. These rules describe primarily `frozen' binomials that exist in exactly one ordering and have lacked large-scale trials to determine efficacy. Online text provides a unique opportunity to study these lists in the context of informal text at a very large scale. In this work, we expand the view of binomials to include a large-scale analysis of both frozen and non-frozen binomials in a quantitative way. Using this data, we then demonstrate that most previously proposed rules are ineffective at predicting binomial ordering. By tracking the order of these binomials across time and communities we are able to establish additional, unexplored dimensions central to these predictions. Expanding beyond the question of individual binomials, we also explore the global structure of binomials in various communities, establishing a new model for these lists and analyzing this structure for non-frozen and frozen binomials. Additionally, novel analysis of trinomials --- lists of length three --- suggests that none of the binomials analysis applies in these cases. Finally, we demonstrate how large data sets gleaned from the web can be used in conjunction with older theories to expand and improve on old questions.
DSFeb 21, 2020
Minimizing Localized Ratio Cut Objectives in HypergraphsNate Veldt, Austin R. Benson, Jon Kleinberg
Hypergraphs are a useful abstraction for modeling multiway relationships in data, and hypergraph clustering is the task of detecting groups of closely related nodes in such data. Graph clustering has been studied extensively, and there are numerous methods for detecting small, localized clusters without having to explore an entire input graph. However, there are only a few specialized approaches for localized clustering in hypergraphs. Here we present a framework for local hypergraph clustering based on minimizing localized ratio cut objectives. Our framework takes an input set of reference nodes in a hypergraph and solves a sequence of hypergraph minimum $s$-$t$ cut problems in order to identify a nearby well-connected cluster of nodes that overlaps substantially with the input set. Our methods extend graph-based techniques but are significantly more general and have new output quality guarantees. First, our methods can minimize new generalized notions of hypergraph cuts, which depend on specific configurations of nodes within each hyperedge, rather than just on the number of cut hyperedges. Second, our framework has several attractive theoretical properties in terms of output cluster quality. Most importantly, our algorithm is strongly-local, meaning that its runtime depends only on the size of the input set, and does not need to explore the entire hypergraph to find good local clusters. We use our methodology to effectively identify clusters in hypergraphs of real-world data with millions of nodes, millions of hyperedges, and large average hyperedge size with runtimes ranging between a few seconds and a few minutes.
NAFeb 19, 2020
Entrywise convergence of iterative methods for eigenproblemsVasileios Charisopoulos, Austin R. Benson, Anil Damle
Several problems in machine learning, statistics, and other fields rely on computing eigenvectors. For large scale problems, the computation of these eigenvectors is typically performed via iterative schemes such as subspace iteration or Krylov methods. While there is classical and comprehensive analysis for subspace convergence guarantees with respect to the spectral norm, in many modern applications other notions of subspace distance are more appropriate. Recent theoretical work has focused on perturbations of subspaces measured in the $\ell_{2 \to \infty}$ norm, but does not consider the actual computation of eigenvectors. Here we address the convergence of subspace iteration when distances are measured in the $\ell_{2 \to \infty}$ norm and provide deterministic bounds. We complement our analysis with a practical stopping criterion and demonstrate its applicability via numerical experiments. Our results show that one can get comparable performance on downstream tasks while requiring fewer iterations, thereby saving substantial computational time.
LGFeb 19, 2020
Residual Correlation in Graph Neural Network RegressionJunteng Jia, Austin R. Benson
A graph neural network transforms features in each vertex's neighborhood into a vector representation of the vertex. Afterward, each vertex's representation is used independently for predicting its label. This standard pipeline implicitly assumes that vertex labels are conditionally independent given their neighborhood features. However, this is a strong assumption, and we show that it is far from true on many real-world graph datasets. Focusing on regression tasks, we find that this conditional independence assumption severely limits predictive power. This should not be that surprising, given that traditional graph-based semi-supervised learning methods such as label propagation work in the opposite fashion by explicitly modeling the correlation in predicted outcomes. Here, we address this problem with an interpretable and efficient framework that can improve any graph neural network architecture simply by exploiting correlation structure in the regression residuals. In particular, we model the joint distribution of residuals on vertices with a parameterized multivariate Gaussian, and estimate the parameters by maximizing the marginal likelihood of the observed labels. Our framework achieves substantially higher accuracy than competing baselines, and the learned parameters can be interpreted as the strength of correlation among connected vertices. Furthermore, we develop linear time algorithms for low-variance, unbiased model parameter estimates, allowing us to scale to large networks. We also provide a basic version of our method that makes stronger assumptions on correlation structure but is painless to implement, often leading to great practical performance with minimal overhead.
GTFeb 2, 2020
Choice Set Optimization Under Discrete Choice Models of Group DecisionsKiran Tomlinson, Austin R. Benson
The way that people make choices or exhibit preferences can be strongly affected by the set of available alternatives, often called the choice set. Furthermore, there are usually heterogeneous preferences, either at an individual level within small groups or within sub-populations of large groups. Given the availability of choice data, there are now many models that capture this behavior in order to make effective predictions--however, there is little work in understanding how directly changing the choice set can be used to influence the preferences of a collection of decision-makers. Here, we use discrete choice modeling to develop an optimization framework of such interventions for several problems of group influence, namely maximizing agreement or disagreement and promoting a particular choice. We show that these problems are NP-hard in general, but imposing restrictions reveals a fundamental boundary: promoting a choice can be easier than encouraging consensus or sowing discord. We design approximation algorithms for the hard problems and show that they work well on real-world choice data.
SIOct 22, 2019
Clustering in graphs and hypergraphs with categorical edge labelsIlya Amburg, Nate Veldt, Austin R. Benson
Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called "higher-order interactions" that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels --- or different interaction types --- where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.
LGMay 24, 2019
Neural Jump Stochastic Differential EquationsJunteng Jia, Austin R. Benson
Many time series are effectively generated by a combination of deterministic continuous flows along with discrete jumps sparked by stochastic events. However, we usually do not have the equation of motion describing the flows, or how they are affected by jumps. To this end, we introduce Neural Jump Stochastic Differential Equations that provide a data-driven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump. Our approach extends the framework of Neural Ordinary Differential Equations with a stochastic process term that models discrete events. We then model temporal point processes with a piecewise-continuous latent trajectory, where the discontinuities are caused by stochastic events whose conditional intensity depends on the latent state. We demonstrate the predictive capabilities of our model on a range of synthetic and real-world marked point process datasets, including classical point processes (such as Hawkes processes), awards on Stack Overflow, medical records, and earthquake monitoring.
LGMay 17, 2019
Graph-based Semi-Supervised & Active Learning for Edge FlowsJunteng Jia, Michael T. Schaub, Santiago Segarra et al.
We present a graph-based semi-supervised learning (SSL) method for learning edge flows defined on a graph. Specifically, given flow measurements on a subset of edges, we want to predict the flows on the remaining edges. To this end, we develop a computational framework that imposes certain constraints on the overall flows, such as (approximate) flow conservation. These constraints render our approach different from classical graph-based SSL for vertex labels, which posits that tightly connected nodes share similar labels and leverages the graph structure accordingly to extrapolate from a few vertex labels to the unlabeled vertices. We derive bounds for our method's reconstruction error and demonstrate its strong performance on synthetic and real-world flow networks from transportation, physical infrastructure, and the Web. Furthermore, we provide two active learning algorithms for selecting informative edges on which to measure flow, which has applications for optimal sensor deployment. The first strategy selects edges to minimize the reconstruction error bound and works well on flows that are approximately divergence-free. The second approach clusters the graph and selects bottleneck edges that cross cluster-boundaries, which works well on flows with global trends.
SIMay 14, 2019
Planted Hitting Set Recovery in HypergraphsIlya Amburg, Jon Kleinberg, Austin R. Benson
In various application areas, networked data is collected by measuring interactions involving some specific set of core nodes. This results in a network dataset containing the core nodes along with a potentially much larger set of fringe nodes that all have at least one interaction with a core node. In many settings, this type of data arises for structures that are richer than graphs, because they involve the interactions of larger sets; for example, the core nodes might be a set of individuals under surveillance, where we observe the attendees of meetings involving at least one of the core individuals. We model such scenarios using hypergraphs, and we study the problem of core recovery: if we observe the hypergraph but not the labels of core and fringe nodes, can we recover the "planted" set of core nodes in the hypergraph? We provide a theoretical framework for analyzing the recovery of such a set of core nodes and use our theory to develop a practical and scalable algorithm for core recovery. The crux of our analysis and algorithm is that the core nodes are a hitting set of the hypergraph, meaning that every hyperedge has at least one node in the set of core nodes. We demonstrate the efficacy of our algorithm on a number of real-world datasets, outperforming competitive baselines derived from network centrality and core-periphery measures.
SIFeb 6, 2019
Modeling and Analysis of Tagging Networks in Stack Exchange CommunitiesXiang Fu, Shangdi Yu, Austin R. Benson
Large Question-and-Answer (Q&A) platforms support diverse knowledge curation on the Web. While researchers have studied user behavior on the platforms in a variety of contexts, there is relatively little insight into important by-products of user behavior that also encode knowledge. Here, we analyze and model the macroscopic structure of tags applied by users to annotate and catalog questions, using a collection of 168 Stack Exchange websites. We find striking similarity in tagging structure across these Stack Exchange communities, even though each community evolves independently (albeit under similar guidelines). Using our empirical findings, we develop a simple generative model that creates random bipartite graphs of tags and questions. Our model accounts for the tag frequency distribution but does not explicitly account for co-tagging correlations. Even under these constraints, we demonstrate empirically and theoretically that our model can reproduce a number of statistical properties of the co-tagging graph that links tags appearing in the same post.
SINov 28, 2018
Link Prediction in Networks with Core-Fringe DataAustin R. Benson, Jon Kleinberg
Data collection often involves the partial measurement of a larger system. A common example arises in collecting network data: we often obtain network datasets by recording all of the interactions among a small set of core nodes, so that we end up with a measurement of the network consisting of these core nodes along with a potentially much larger set of fringe nodes that have links to the core. Given the ubiquity of this process for assembling network data, it is crucial to understand the role of such a `core-fringe' structure. Here we study how the inclusion of fringe nodes affects the standard task of network link prediction. One might initially think the inclusion of any additional data is useful, and hence that it should be beneficial to include all fringe nodes that are available. However, we find that this is not true; in fact, there is substantial variability in the value of the fringe nodes for prediction. Once an algorithm is selected, in some datasets, including any additional data from the fringe can actually hurt prediction performance; in other datasets, including some amount of fringe information is useful before prediction performance saturates or even declines; and in further cases, including the entire fringe leads to the best performance. While such variety might seem surprising, we show that these behaviors are exhibited by simple random graph models.
SIJul 25, 2018
Three hypergraph eigenvector centralitiesAustin R. Benson
Eigenvector centrality is a standard network analysis tool for determining the importance of (or ranking of) entities in a connected system that is represented by a graph. However, many complex systems and datasets have natural multi-way interactions that are more faithfully modeled by a hypergraph. Here we extend the notion of graph eigenvector centrality to uniform hypergraphs. Traditional graph eigenvector centralities are given by a positive eigenvector of the adjacency matrix, which is guaranteed to exist by the Perron-Frobenius theorem under some mild conditions. The natural representation of a hypergraph is a hypermatrix (colloquially, a tensor). Using recently established Perron-Frobenius theory for tensors, we develop three tensor eigenvectors centralities for hypergraphs, each with different interpretations. We show that these centralities can reveal different information on real-world data by analyzing hypergraphs constructed from n-gram frequencies, co-tagging on stack exchange, and drug combinations observed in patient emergency room visits.
SIMay 3, 2018
Found Graph Data and Planted Vertex CoversAustin R. Benson, Jon Kleinberg
A typical way in which network data is recorded is to measure all the interactions among a specified set of core nodes; this produces a graph containing this core together with a potentially larger set of fringe nodes that have links to the core. Interactions between pairs of nodes in the fringe, however, are not recorded by this process, and hence not present in the resulting graph data. For example, a phone service provider may only have records of calls in which at least one of the participants is a customer; this can include calls between a customer and a non-customer, but not between pairs of non-customers. Knowledge of which nodes belong to the core is an important piece of metadata that is crucial for interpreting the network dataset. But in many cases, this metadata is not available, either because it has been lost due to difficulties in data provenance, or because the network consists of found data obtained in settings such as counter-surveillance. This leads to a natural algorithmic problem, namely the recovery of the core set. Since the core set forms a vertex cover of the graph, we essentially have a planted vertex cover problem, but with an arbitrary underlying graph. We develop a theoretical framework for analyzing this planted vertex cover problem, based on results in the theory of fixed-parameter tractability, together with algorithms for recovering the core. Our algorithms are fast, simple to implement, and out-perform several methods based on network core-periphery structure on various real-world datasets.
SIFeb 20, 2018
Simplicial Closure and higher-order link predictionAustin R. Benson, Rediet Abebe, Michael T. Schaub et al.
Networks provide a powerful formalism for modeling complex systems by using a model of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once; for example, communication within a group rather than person-to person, collaboration among a team rather than a pair of coauthors, or biological interaction between a set of molecules rather than just two. Such higher-order interactions are ubiquitous, but their empirical study has received limited attention, and little is known about possible organizational principles of such structures. Here we study the temporal evolution of 19 datasets with explicit accounting for higher-order interactions. We show that there is a rich variety of structure in our datasets but datasets from the same system types have consistent patterns of higher-order structure. Furthermore, we find that tie strength and edge density are competing positive indicators of higher-order organization, and these trends are consistent across interactions involving differing numbers of nodes. To systematically further the study of theories for such higher-order structures, we propose higher-order link prediction as a benchmark problem to assess models and algorithms that predict higher-order structure. We find a fundamental differences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the appearance of new interactions.
SIFeb 19, 2018
Tools for higher-order network analysisAustin R. Benson
Networks are a fundamental model of complex systems throughout the sciences, and network datasets are typically analyzed through lower-order connectivity patterns described at the level of individual nodes and edges. However, higher-order connectivity patterns captured by small subgraphs, also called network motifs, describe the fundamental structures that control and mediate the behavior of many complex systems. We develop three tools for network analysis that use higher-order connectivity patterns to gain new insights into network datasets: (1) a framework to cluster nodes into modules based on joint participation in network motifs; (2) a generalization of the clustering coefficient measurement to investigate higher-order closure patterns; and (3) a definition of network motifs for temporal networks and fast algorithms for counting them. Using these tools, we analyze data from biology, ecology, economics, neuroscience, online social networks, scientific collaborations, telecommunications, transportation, and the World Wide Web.
SIApr 12, 2017
Higher-order clustering in networksHao Yin, Austin R. Benson, Jure Leskovec
A fundamental property of complex networks is the tendency for edges to cluster. The extent of the clustering is typically quantified by the clustering coefficient, which is the probability that a length-2 path is closed, i.e., induces a triangle in the network. However, higher-order cliques beyond triangles are crucial to understanding complex networks, and the clustering behavior with respect to such higher-order network structures is not well understood. Here we introduce higher-order clustering coefficients that measure the closure probability of higher-order network cliques and provide a more comprehensive view of how the edges of complex networks cluster. Our higher-order clustering coefficients are a natural generalization of the traditional clustering coefficient. We derive several properties about higher-order clustering coefficients and analyze them under common random graph models. Finally, we use higher-order clustering coefficients to gain new insights into the structure of real-world networks from several domains.
SIDec 29, 2016
Motifs in Temporal NetworksAshwin Paranjape, Austin R. Benson, Jure Leskovec
Networks are a fundamental tool for modeling complex systems in a variety of domains including social and communication networks as well as biology and neuroscience. Small subgraph patterns in networks, called network motifs, are crucial to understanding the structure and function of these systems. However, the role of network motifs in temporal networks, which contain many timestamped links between the nodes, is not yet well understood. Here we develop a notion of a temporal network motif as an elementary unit of temporal networks and provide a general methodology for counting such motifs. We define temporal network motifs as induced subgraphs on sequences of temporal edges, design fast algorithms for counting temporal motifs, and prove their runtime complexity. Our fast algorithms achieve up to 56.5x speedup compared to a baseline method. Furthermore, we use our algorithms to count temporal motifs in a variety of networks. Results show that networks from different domains have significantly different motif counts, whereas networks from the same domain tend to have similar motif counts. We also find that different motifs occur at different time scales, which provides further insights into structure and function of temporal networks.
NAJul 25, 2016
Improving the numerical stability of fast matrix multiplicationGrey Ballard, Austin R. Benson, Alex Druinsky et al.
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few fast algorithms have been efficiently implemented or used in practical applications. However, there exist many practical alternatives to Strassen's algorithm with varying performance and numerical properties. Fast algorithms are known to be numerically stable, but because their error bounds are slightly weaker than the classical algorithm, they are not used even in cases where they provide a performance benefit. We argue in this paper that the numerical sacrifice of fast algorithms, particularly for the typical use cases of practical algorithms, is not prohibitive, and we explore ways to improve the accuracy both theoretically and empirically. The numerical accuracy of fast matrix multiplication depends on properties of the algorithm and of the input matrices, and we consider both contributions independently. We generalize and tighten previous error analyses of fast algorithms and compare their properties. We discuss algorithmic techniques for improving the error guarantees from two perspectives: manipulating the algorithms, and reducing input anomalies by various forms of diagonal scaling. Finally, we benchmark performance and demonstrate our improved numerical accuracy.
LGFeb 27, 2014
Scalable methods for nonnegative matrix factorizations of near-separable tall-and-skinny matricesAustin R. Benson, Jason D. Lee, Bartek Rajwa et al.
Numerous algorithms are used for nonnegative matrix factorization under the assumption that the matrix is nearly separable. In this paper, we show how to make these algorithms efficient for data matrices that have many more rows than columns, so-called "tall-and-skinny matrices". One key component to these improved methods is an orthogonal matrix transformation that preserves the separability of the NMF problem. Our final methods need a single pass over the data matrix and are suitable for streaming, multi-core, and MapReduce architectures. We demonstrate the efficacy of these algorithms on terabyte-sized synthetic matrices and real-world matrices from scientific computing and bioinformatics.