LGJan 24, 2025
Humanity's Last ExamLong Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
COMar 16
The strong chromatic index of $K_{t,t}$-free graphsRichard Bi, Peter Bradshaw, Abhishek Dhawan et al.
A strong edge coloring of a graph $G$ is an edge coloring $Ï\,:\,E(G) \rightarrow \mathbb N$ such that each color class forms an induced matching in $G$. The strong chromatic index of $G$, written $Ï'_s(G)$, is the minimum number of colors needed for a strong edge coloring of $G$. ErdÅs and NeÅ¡etÅil conjectured in 1985 that if $G$ has maximum degree $d$, then $Ï'_s(G) \leq \frac 54 d^2$. Mahdian showed in 2000 that if $G$ is $C_4$-free, then $Ï'_s(G) \leq (2+o(1)) \frac{d^2}{\log d}$, and he conjectured that the same upper bound holds for $K_{t,t}$-free graphs. In this paper, we prove this conjecture and improve upon it to show the following: every $K_{t,t}$-free graph $G$ of maximum degree $d$ satisfies $Ï'_s(G) \leq (1+o(1)) \frac{d^2}{\log d}$. We employ a variant of the Rödl nibble method to prove this result. The key new ingredient in our adaptation of the method is an application of the KÅvári-Sós-Turán theorem to show that $H := L(G)^2$ satisfies certain structural properties. These properties, in conjunction with a variant of Talagrand's inequality to handle exceptional outcomes, allow us to concentrate the sizes of certain vertex sets through the nibble, even when these vertex sets have order smaller than the maximum codegree of $H$. We encapsulate these structural properties into a more general statement on list coloring that we believe to be of independent interest. In light of the conjectured computational threshold for coloring random graphs arising in average-case complexity theory, we suspect that our result is best possible using this approach.
COJan 29, 2025
Single-conflict colorings of degenerate graphsPeter Bradshaw, Tomáš Masařík
We consider the single-conflict coloring problem, a graph coloring problem in which each edge of a graph receives a forbidden ordered color pair. The task is to find a vertex coloring such that no two adjacent vertices receive a pair of colors forbidden at an edge joining them. We show that for any assignment of forbidden color pairs to the edges of a $d$-degenerate graph $G$ on $n$ vertices of edge-multiplicity at most $\log \log n$, $O(\sqrt{ d } \log n)$ colors are always enough to color the vertices of $G$ in a way that avoids every forbidden color pair. This answers a question of DvoÅák, Esperet, Kang, and Ozeki for simple graphs (Journal of Graph Theory 2021).