h-index9
4papers
7citations
Novelty64%
AI Score42

4 Papers

AIFeb 9
Debate is efficient with your time

Jonah Brown-Cohen, Geoffrey Irving, Simon C. Marshall et al.

AI safety via debate uses two competing models to help a human judge verify complex computational tasks. Previous work has established what problems debate can solve in principle, but has not analysed the practical cost of human oversight: how many queries must the judge make to the debate transcript? We introduce Debate Query Complexity}(DQC), the minimum number of bits a verifier must inspect to correctly decide a debate. Surprisingly, we find that PSPACE/poly (the class of problems which debate can efficiently decide) is precisely the class of functions decidable with O(log n) queries. This characterisation shows that debate is remarkably query-efficient: even for highly complex problems, logarithmic oversight suffices. We also establish that functions depending on all their input bits require Omega(log n) queries, and that any function computable by a circuit of size s satisfies DQC(f) <= log(s) + 3. Interestingly, this last result implies that proving DQC lower bounds of log(n) + 6 for languages in P would yield new circuit lower bounds, connecting debate query complexity to central questions in circuit complexity.

QUANT-PHFeb 7, 2025
Quantum speedup of non-linear Monte Carlo problems

Jose Blanchet, Yassine Hamoudi, Mario Szegedy et al.

The mean of a random variable can be understood as a linear functional on the space of probability distributions. Quantum computing is known to provide a quadratic speedup over classical Monte Carlo methods for mean estimation. In this paper, we investigate whether a similar quadratic speedup is achievable for estimating non-linear functionals of probability distributions. We propose a quantum-inside-quantum Monte Carlo algorithm that achieves such a speedup for a broad class of non-linear estimation problems, including nested conditional expectations and stochastic optimization. Our algorithm improves upon the direct application of the quantum multilevel Monte Carlo algorithm introduced by An et al. (2021). The existing lower bound indicates that our algorithm is optimal up polylogarithmic factors. A key innovation of our approach is a new sequence of multilevel Monte Carlo approximations specifically designed for quantum computing, which is central to the algorithm's improved performance.

CLNov 16, 2025
On the Brittleness of LLMs: A Journey around Set Membership

Lea Hergert, Gábor Berend, Mario Szegedy et al.

Large language models (LLMs) achieve superhuman performance on complex reasoning tasks, yet often fail on much simpler problems, raising concerns about their reliability and interpretability. We investigate this paradox through a focused study with two key design features: simplicity, to expose basic failure modes, and scale, to enable comprehensive controlled experiments. We focus on set membership queries -- among the most fundamental forms of reasoning -- using tasks like ``Is apple an element of the set \{pear, plum, apple, raspberry\}?''. We conduct a systematic empirical evaluation across prompt phrasing, semantic structure, element ordering, and model choice. Our large-scale analysis reveals that LLM performance on this elementary task is consistently brittle, and unpredictable across all dimensions, suggesting that the models' ``understanding'' of the set concept is fragmented and convoluted at best. Our work demonstrates that the large-scale experiments enabled by the simplicity of the problem allow us to map and analyze the failure modes comprehensively, making this approach a valuable methodology for LLM evaluation in general.

ROFeb 12, 2020
Rubik Tables and Object Rearrangement

Mario Szegedy, Jingjin Yu

A great number of robotics applications demand the rearrangement of many mobile objects, e.g., organizing products on shelves, shuffling containers at shipping ports, reconfiguring fleets of mobile robots, and so on. To boost the throughput in systems designed for solving these rearrangement problems, it is essential to minimize the number of atomic operations, e.g., the pick-n-places of individual objects. However, this optimization task poses a rather difficult challenge due to complex inter-dependency between objects, especially in high-density settings. In tackling the aforementioned challenges, we develop a novel algorithmic tool, Rubik Tables, that provides a clean abstraction of object rearrangement problems as the proxy problem of shuffling items stored in a table or lattice. In its basic form, a Rubik Table is an $n\times n$ table containing $n^2$ items. We show that the reconfiguration of items in such a Rubik Table can be achieved using at most $2n$ column/row shuffles in the partially labeled setting, where each column (resp., row) shuffle may arbitrarily permute the items stored in a column (resp., row) of the table. When items are fully distinguishable, additional $n$ shuffles are needed. Rubik Tables allow many generalizations, e.g., to higher dimensions. Using Rubik Table, we have designed a first constant-factor optimal algorithm for stack rearrangement problems. We show that, for $nd$ items stored in $n$ stacks of depth $d$ each, using one empty stack as the swap space, $O(nd)$ stack pop-push operations are sufficient for an arbitrary reconfiguration of the stacks where $d \le n^{\frac{m}{2}}$ for arbitrary fixed $m >0$. Rubik Table results also allow the development of constant-factor optimal solutions for solving multi-robot motion planning problems under extreme robot density. These algorithms based on Rubik Table results run in low-polynomial time.