DSNov 18, 2023
Dueling Optimization with a Monotone AdversaryAvrim Blum, Meghal Gupta, Gene Li et al.
We introduce and study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer $\mathbf{x}^{*}$ for a function $f\colon X \to \mathbb{R}$, where $X \subseteq \mathbb{R}^d$. In each round, the algorithm submits a pair of guesses, i.e., $\mathbf{x}^{(1)}$ and $\mathbf{x}^{(2)}$, and the adversary responds with any point in the space that is at least as good as both guesses. The cost of each query is the suboptimality of the worse of the two guesses; i.e., ${\max} \left( f(\mathbf{x}^{(1)}), f(\mathbf{x}^{(2)}) \right) - f(\mathbf{x}^{*})$. The goal is to minimize the number of iterations required to find an $\varepsilon$-optimal point and to minimize the total cost (regret) of the guesses over many rounds. Our main result is an efficient randomized algorithm for several natural choices of the function $f$ and set $X$ that incurs cost $O(d)$ and iteration complexity $O(d\log(1/\varepsilon)^2)$. Moreover, our dependence on $d$ is asymptotically optimal, as we show examples in which any randomized algorithm for this problem must incur $Ω(d)$ cost and iteration complexity.
LGFeb 24Code
QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical ProofsSantiago Gonzalez, Alireza Amiri Bavandpour, Peter Ye et al.
As Large Language Models (LLMs) saturate elementary benchmarks, the research frontier has shifted from generation to the reliability of automated evaluation. We demonstrate that standard "LLM-as-a-Judge" protocols suffer from a systematic Alignment Gap when applied to upper-undergraduate to early graduate level mathematics. To quantify this, we introduce QEDBench, the first large-scale dual-rubric alignment benchmark to systematically measure alignment with human experts on university-level math proofs by contrasting course-specific rubrics against expert common knowledge criteria. By deploying a dual-evaluation matrix (7 judges x 5 solvers) against 1,000+ hours of human evaluation, we reveal that certain frontier evaluators like Claude Opus 4.5, DeepSeek-V3, Qwen 2.5 Max, and Llama 4 Maverick exhibit significant positive bias (up to +0.18, +0.20, +0.30, +0.36 mean score inflation, respectively). Furthermore, we uncover a critical reasoning gap in the discrete domain: while Gemini 3.0 Pro achieves state-of-the-art performance (0.91 average human evaluation score), other reasoning models like GPT-5 Pro and Claude Sonnet 4.5 see their performance significantly degrade in discrete domains. Specifically, their average human evaluation scores drop to 0.72 and 0.63 in Discrete Math, and to 0.74 and 0.50 in Graph Theory. In addition to these research results, we also release QEDBench as a public benchmark for evaluating and improving AI judges. Our benchmark is publicly published at https://github.com/qqliu/Yale-QEDBench.
QUANT-PHApr 16
Super-Constant Weight Dicke States in Constant Depth Without FanoutLucas Gretta, Meghal Gupta, Malvika Raj Joshi
An $n$-qubit Dicke state of weight $k$, is the uniform superposition over all $n$-bit strings of Hamming weight $k$. Dicke states are an entanglement resource with important practical applications in the NISQ era and, for instance, play a central role in Decoded Quantum Interferometry (DQI). Furthermore, any symmetric state can be expressed as a superposition of Dicke states. First, we give explicit constant-depth circuits that prepare $n$-qubit Dicke states for all $k \leq \text{polylog}(n)$, using only multi-qubit Toffoli gates and single-qubit unitaries. This gives the first $\text{QAC}^0$ construction of super-constant weight Dicke states. Previous constant-depth constructions for any super-constant $k$ required the FANOUT$_n$ gate, while $\text{QAC}^0$ is only known to implement FANOUT$_k$ for $k$ up to $\text{polylog}(n)$. Moreover, we show that any weight-$k$ Dicke state can be constructed with access to FANOUT$_{\min(k,n-k)}$, rather than FANOUT$_n$. Combined with recent hardness results, this yields a tight characterization: for $k \leq n/2$, weight-$k$ Dicke states can be prepared in $\text{QAC}^0$ if and only if FANOUT$_k \in \text{QAC}^0$. We further extend our techniques to show that, in fact, \emph{any} superposition of $n$-qubit Dicke states of weight at most $k$ can be prepared in $\text{QAC}^0$ with access to FANOUT$_k$. Taking $k = n$, we obtain the first $O(1)$-depth unitary construction for arbitrary symmetric states. In particular, any symmetric state can be prepared in constant depth on quantum hardware architectures that support FANOUT$_n$, such as trapped ions with native global entangling operations.
QUANT-PHApr 3
Parity $\notin$ QAC0 $\iff$ QAC0 is Fourier-ConcentratedLucas Gretta, Meghal Gupta, Malvika Raj Joshi
A major open problem in understanding shallow quantum circuits (QAC$^0$) is whether they can compute Parity. We show that this question is solely about the Fourier spectrum of QAC$^0$: any QAC$^0$ circuit with non-negligible high-level Fourier mass suffices to exactly compute PARITY in QAC$^0$. Thus, proving a quantum analog of the seminal LMN theorem for AC$^0$ is necessary to bound the quantum circuit complexity of PARITY. In the other direction, LMN does not fully capture the limitations of AC$^0$. For example, despite MAJORITY having $99\%$ of its weight on low-degree Fourier coefficients, no AC$^0$ circuit can non-trivially correlate with it. In contrast, we provide a QAC$^0$ circuit that achieves $(1-o(1))$ correlation with MAJORITY, establishing the first average-case decision separation between AC$^0$ and QAC$^0$. This suggests a uniquely quantum phenomenon: unlike in the classical setting, Fourier concentration may largely characterize the power of QAC$^0$. PARITY is also known to be equivalent in QAC$^0$ to inherently quantum tasks such as preparing GHZ states to high fidelity. We extend this equivalence to a broad class of state-synthesis tasks. We demonstrate that existing metrics such as trace distance, fidelity, and mutual information are insufficient to capture these states and introduce a new measure, felinity. We prove that preparing any state with non-negligible felinity, or derived states such as poly(n)-weight Dicke states, implies PARITY $\in$ QAC$^0$.