Jamie Tucker-Foltz

GT
h-index45
6papers
348citations
Novelty57%
AI Score56

6 Papers

DSMay 7
Sampling Tree-Weighted Partitions Without Sampling Trees

Sarah Cannon, Topher Pankow, Wesley Pegden et al.

This paper gives a new algorithm for sampling tree-weighted partitions of a large class of planar graphs. Formally, the tree-weighted distribution on $k$-partitions of a graph weights $k$-partitions proportional to the product of the number of spanning trees of each partition class. Recent work on computational redistricting analysis has driven special interest in the conditional distribution where all partition classes have the same size (balanced partitions). One class of Markov chains in wide use aims to sample from balanced tree-weighted $k$-partitions using a sampler for balanced tree-weighted 2-partitions. Previous implementations of this 2-partition sampler would draw a random spanning tree and check whether it contains an edge whose removal produces a balanced 2-component forest, rejecting if not. In practice, this is a significant computational bottleneck. We show that in fact it is possible to sample from the balanced tree-weighted 2-partition distribution directly, without first sampling a spanning tree; the acceptance and rejection rates are the same as in previous samplers. We prove that on a wide class of planar graphs encompassing network structures typically arising from the geographic data used in computational redistricting, our algorithm takes expected linear time $O(n)$. Notably, this is asymptotically faster than the best known method to generate random trees, which is $O(n \log^2 n)$ for approximate sampling and $O(n^{1 + \log \log \log n / \log \log n})$ for exact sampling. Additionally, we show that a variant of our algorithm also gives a speedup to $O(n \log n)$ for exact sampling of uniformly random trees on these families of graphs, improving the bounds for both exact and approximate sampling. We implement our algorithm and benchmark it on grid graphs, finding that it outperforms the standard bipartitioning method in the widely-used GerryChain library.

THMay 31
Cheap Talk in Bilateral Trade

Jamie Tucker-Foltz, Richard Zeckhauser

A single seller offers one or more goods to a single buyer. The buyer's values and the seller's costs are private information. Each player has a commonly known prior over the other player's value or cost, supported on a finite set. What is the optimal selling mechanism? We argue that, despite this question's importance and apparent simplicity, prior work offers no satisfactory answer. If the seller simply chooses an optimal menu given her realized costs, she fails to exploit her informational advantage. At the other extreme, the optimal trade mechanism that satisfies IC/IR constraints for both parties fails in practice, as it conditions prices on the seller's unknown costs in an unenforceable way. The seller's realistic capabilities lie somewhere in between: she may leverage private information but lacks unlimited commitment power. To bridge this gap, we consider a solution concept built on the realistic assumption that the seller can commit to prices but nothing more. Similar -- albeit technically distinct -- solution concepts have been studied in the context of auctions with multiple buyers. Our concept proves surprisingly rich even with a single buyer. In our model, the buyer and seller engage in multiple rounds of cheap talk before the seller posts a menu of priced bundles. The buyer then purchases. We measure value as profit for the seller and consumer surplus for the buyer. We prove that with a single good cheap talk cannot help either party, but show that it creates value in any extension of this canonical setting: multiple goods, multiple units, interdependent values, or repeated play. We also show that multiple rounds of communication can yield strictly higher expected profit than a single round. Finally, we discuss how realistic factors beyond our stripped-down model combine with cheap talk to enhance this value even further.

DMMar 10
Models of random spanning trees

Eric Babson, Moon Duchin, Annina Iseli et al.

There are numerous randomized algorithms to generate spanning trees in a given ambient graph; several target the uniform distribution on trees (UST), while in practice the fastest and most frequently used draw random weights on the edges and then employ a greedy algorithm to choose the minimum-weight spanning tree (MST). Though MST is a workhorse in applications, the mathematical properties of random MST are far less explored than those of UST. In this paper we develop tools for the quantitative study of random MST. We consider the standard case that the weights are drawn i.i.d. from a single distribution on the real numbers, as well as successive generalizations that lead to \emph{product measures}, where the weights are independently drawn from arbitrary distributions.

GTMar 17
Adaptive Contracts for Cost-Effective AI Delegation

Eden Saig, Tamar Garbuz, Ariel D. Procaccia et al.

When organizations delegate text generation tasks to AI providers via pay-for-performance contracts, expected payments rise when evaluation is noisy. As evaluation methods become more elaborate, the economic benefits of decreased noise are often overshadowed by increased evaluation costs. In this work, we introduce adaptive contracts for AI delegation, which allow detailed evaluation to be performed selectively after observing an initial coarse signal in order to conserve resources. We make three sets of contributions: First, we provide efficient algorithms for computing optimal adaptive contracts under natural assumptions or when core problem dimensions are small, and prove hardness of approximation in the general unstructured case. We then formulate alternative models of randomized adaptive contracts and discuss their benefits and limitations. Finally, we empirically demonstrate the benefits of adaptivity over non-adaptive baselines using question-answering and code-generation datasets.

LGJan 24, 2025
Humanity's Last Exam

Long Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

GTFeb 16, 2024
Computing Voting Rules with Elicited Incomplete Votes

Daniel Halpern, Safwan Hossain, Jamie Tucker-Foltz · deepmind

Motivated by the difficulty of specifying complete ordinal preferences over a large set of $m$ candidates, we study voting rules that are computable by querying voters about $t < m$ candidates. Generalizing prior works that focused on specific instances of this problem, our paper fully characterizes the set of positional scoring rules that can be computed for any $1 \leq t < m$, which, notably, does not include plurality. We then extend this to show a similar impossibility result for single transferable vote (elimination voting). These negative results are information-theoretic and agnostic to the number of queries. Finally, for scoring rules that are computable with limited-sized queries, we give parameterized upper and lower bounds on the number of such queries a deterministic or randomized algorithm must make to determine the score-maximizing candidate. While there is no gap between our bounds for deterministic algorithms, identifying the exact query complexity for randomized algorithms is a challenging open problem, of which we solve one special case.