MEMay 26
When prompt perturbations break your A/B test: A valid statistical test for generative surveyingHayden Helm, Carey Priebe
Generative surveying -- where collections of LLM-based personas provide feedback on messages -- has emerged as a cheap and scalable alternative to traditional market research. However, LLMs are sensitive to small variations in prompt design and conclusions drawn from generative surveys may depend on arbitrary phrasing choices. Controlling for this sensitivity requires including semantically equivalent perturbations in the analysis. In this paper, we show that standard hypothesis tests, including the sign test and Wilcoxon signed-rank test, are invalid under a statistical model for generative surveying that includes realistic perturbation structure. We propose a permutation test that is valid under this model and formally characterize the conditions under which standard tests fail. Applying our framework to a simple generative surveying problem, we estimate relevant parameters, characterize the power of the permutation test under realistic conditions, and provide practical guidance on budget allocation across personas, perturbations, and replicates. Finally, we show that both the magnitude and direction of the estimated effect are sensitive to the choice of model, even within the same model family.
CLOct 25, 2022
Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal TransportKelly Marchisio, Ali Saad-Eldin, Kevin Duh et al.
Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.
LGMay 8
Query-efficient model evaluation using cached responsesHayden Helm, Ben Johnson, Carey Priebe
Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for predicting benchmark performance that leverages cached model responses based on the Data Kernel Perspective Space (DKPS), a method for quantifying the relationship between models in the black-box setting. Theoretically, we show that DKPS-based methods are query-efficient under certain conditions. Empirically, we demonstrate that DKPS-based methods achieve the same mean absolute error as baselines with a substantially decreased query budget. We conclude by proposing an offline method for selecting a set of queries that maximizes the goodness-of-fit on reference models, improving prediction accuracy over random query selection.
MAMay 11
Control Charts for Multi-agent SystemsHayden Helm, Carey Priebe, Brandon Duderstadt
Generative agents have proven to be powerful assistants in a wide variety of contexts. Given this success, users are now deploying agents with minimal restrictions in open ended, multi-agent environments. Current methods for monitoring the dynamics of open-ended multi-agent systems are limited to qualitative inspection. In this paper, we extend the process-theoretic notion of adaptive control charts to multi-agent systems to enable automated monitoring. Using simulation, we demonstrate that adaptive control charts are necessary for monitoring multi-agent systems that can learn from their environment. We further demonstrate, both empirically and theoretically, that adaptive control charts are susceptible to adversarial agents that defect sufficiently slowly. These results illustrate a fundamental tradeoff in multi-agent system control: either agents in a system cannot learn or the system is susceptible to adversaries.
CLSep 26, 2021Code
An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding SpacesKelly Marchisio, Youngser Park, Ali Saad-Eldin et al.
Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exploits new techniques from the graph matching optimization literature. These contrasting approaches have not been compared in BLI so far. In this work, we study the behavior of Euclidean versus graph-based approaches to BLI under differing data conditions and show that they complement each other when combined. We release our code at https://github.com/kellymarchisio/euc-v-graph-bli.
LGMay 8
Black-box model classification under the discriminative factorizationHayden Helm, Merrick Ohata, Carey Priebe
Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the estimated discriminative field reproduce the empirical ordering of oracle query sets.
DCFeb 6, 2024
Edge-Parallel Graph Encoder EmbeddingAriel Lubonja, Cencheng Shen, Carey Priebe et al.
New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted language. We refactor GEE into a parallel program in the Ligra graph engine that maps functions over the edges of the graph and uses lock-free atomic instrutions to prevent data races. On a graph with 1.8B edges, this results in a 500 times speedup over the original implementation and a 17 times speedup over a just-in-time compiled version.
MLOct 12, 2019
Spectral embedding of weighted graphsIan Gallagher, Andrew Jones, Anna Bertiger et al.
When analyzing weighted networks using spectral embedding, a judicious transformation of the edge weights may produce better results. To formalize this idea, we consider the asymptotic behavior of spectral embedding for different edge-weight representations, under a generic low rank model. We measure the quality of different embeddings -- which can be on entirely different scales -- by how easy it is to distinguish communities, in an information-theoretic sense. For common types of weighted graphs, such as count networks or p-value networks, we find that transformations such as tempering or thresholding can be highly beneficial, both in theory and in practice.
MLJun 7, 2019
Vertex Classification on Weighted NetworksHayden Helm, Joshua Vogelstein, Carey Priebe
This paper proposes a discrimination technique for vertices in a weighted network. We assume that the edge weights and adjacencies in the network are conditionally independent and that both sources of information encode class membership information. In particular, we introduce a edge weight distribution matrix to the standard K-Block Stochastic Block Model to model weighted networks. This allows us to develop simple yet powerful extensions of classification techniques using the spectral embedding of the unweighted adjacency matrix. We consider two assumptions on the edge weight distributions and propose classification procedures in both settings. We show the effectiveness of the proposed classifiers by comparing them to quadratic discriminant analysis following the spectral embedding of a transformed weighted network. Moreover, we discuss and show how the methods perform when the edge weights do not encode class membership information.
LGJun 4, 2019
Sparse Representation Classification via Screening for GraphsCencheng Shen, Li Chen, Yuexiao Dong et al.
The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption. In this paper we propose a new implementation of SRC via screening, establish its equivalence to the original SRC under regularity conditions, and prove its classification consistency for random graphs drawn from stochastic blockmodels. The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance but significantly faster.
MLFeb 14, 2018
Vertex nomination: The canonical sampling and the extended spectral nomination schemesJordan Yoder, Li Chen, Henry Pao et al.
Suppose that one particular block in a stochastic block model is of interest, but block labels are only observed for a few of the vertices in the network. Utilizing a graph realized from the model and the observed block labels, the vertex nomination task is to order the vertices with unobserved block labels into a ranked nomination list with the goal of having an abundance of interesting vertices near the top of the list. There are vertex nomination schemes in the literature, including the optimally precise canonical nomination scheme~$\mathcal{L}^C$ and the consistent spectral partitioning nomination scheme~$\mathcal{L}^P$. While the canonical nomination scheme $\mathcal{L}^C$ is provably optimally precise, it is computationally intractable, being impractical to implement even on modestly sized graphs. With this in mind, an approximation of the canonical scheme---denoted the {\it canonical sampling nomination scheme} $\mathcal{L}^{CS}$---is introduced; $\mathcal{L}^{CS}$ relies on a scalable, Markov chain Monte Carlo-based approximation of $\mathcal{L}^{C}$, and converges to $\mathcal{L}^{C}$ as the amount of sampling goes to infinity. The spectral partitioning nomination scheme is also extended to the {\it extended spectral partitioning nomination scheme}, $\mathcal{L}^{EP}$, which introduces a novel semisupervised clustering framework to improve upon the precision of $\mathcal{L}^P$. Real-data and simulation experiments are employed to illustrate the precision of these vertex nomination schemes, as well as their empirical computational complexity. Keywords: vertex nomination, Markov chain Monte Carlo, spectral partitioning, Mclust MSC[2010]: 60J22, 65C40, 62H30, 62H25
MLJun 24, 2014
Techniques for clustering interaction data as a collection of graphsNam H. Lee, Carey Priebe, Youngser Park et al.
A natural approach to analyze interaction data of form "what-connects-to-what-when" is to create a time-series (or rather a sequence) of graphs through temporal discretization (bandwidth selection) and spatial discretization (vertex contraction). Such discretization together with non-negative factorization techniques can be useful for obtaining clustering of graphs. Motivating application of performing clustering of graphs (as opposed to vertex clustering) can be found in neuroscience and in social network analysis, and it can also be used to enhance community detection (i.e., vertex clustering) by way of conditioning on the cluster labels. In this paper, we formulate a problem of clustering of graphs as a model selection problem. Our approach involves information criteria, non-negative matrix factorization and singular value thresholding, and we illustrate our techniques using real and simulated data.
MLNov 23, 2013
Robust Vertex ClassificationLi Chen, Cencheng Shen, Joshua Vogelstein et al.
For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension. In this paper, we propose a sparse representation vertex classifier which does not require information about the model dimension. This classifier represents a test vertex as a sparse combination of the vertices in the training set and uses the recovered coefficients to classify the test vertex. We prove consistency of our proposed classifier for stochastic blockmodels, and demonstrate that the sparse representation classifier can predict vertex labels with higher accuracy than adjacency spectral embedding approaches via both simulation studies and real data experiments. Our results demonstrate the robustness and effectiveness of our proposed vertex classifier when the model dimension is unknown.
MLOct 2, 2013
Perfect Clustering for Stochastic Blockmodel Graphs via Adjacency Spectral EmbeddingVince Lyzinski, Daniel Sussman, Minh Tang et al.
Vertex clustering in a stochastic blockmodel graph has wide applicability and has been the subject of extensive research. In thispaper, we provide a short proof that the adjacency spectral embedding can be used to obtain perfect clustering for the stochastic blockmodel and the degree-corrected stochastic blockmodel. We also show an analogous result for the more general random dot product graph model.