5.5OCMar 17
A Threshold Phenomenon for the Shortest Lattice Vector Problem in the Infinity NormStefan Kuhlmann, Robert Weismantel
One important question in the theory of lattices is to detect a shortest vector: given a norm and a lattice, what is the smallest norm attained by a non-zero vector contained in the lattice? We focus on the infinity norm and work with lattices of the form $A\mathbb{Z}^n$, where $A$ has integer entries and is of full column rank. Finding a shortest vector is NP-hard. We show that this task is fixed parameter tractable in the parameter $Î$, the largest absolute value of the determinant of a full rank submatrix of $A$. The algorithm is based on a structural result that can be interpreted as a threshold phenomenon: whenever the dimension $n$ exceeds a certain value determined only by $Î$, then a shortest lattice vector attains an infinity norm value of one. This threshold phenomenon has several applications. In particular, it reveals that integer optimal solutions lie on faces of the given polyhedron whose dimensions are bounded only in terms of $Î$.
CLSep 30, 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof GenerationJohannes Schmitt, Gergely Bérczi, Jasper Dekoninck et al.
As the mathematical capabilities of large language models (LLMs) improve, it becomes increasingly important to evaluate their performance on research-level tasks at the frontier of mathematical knowledge. However, existing benchmarks are limited, as they focus solely on final-answer questions or high-school competition problems. To address this gap, we introduce IMProofBench, a private benchmark consisting of 39 peer-reviewed problems developed by expert mathematicians. Each problem requires a detailed proof and is paired with subproblems that have final answers, supporting both an evaluation of mathematical reasoning capabilities by human experts and a large-scale quantitative analysis through automated grading. Furthermore, unlike prior benchmarks, the evaluation setup simulates a realistic research environment: models operate in an agentic framework with tools like web search for literature review and mathematical software such as SageMath. Our results show that current LLMs can succeed at the more accessible research-level questions, but still encounter significant difficulties on more challenging problems. Quantitatively, Grok-4 achieves the highest accuracy of 52% on final-answer subproblems, while GPT-5 obtains the best performance for proof generation, achieving a fully correct solution for 22% of problems. IMProofBench will continue to evolve as a dynamic benchmark in collaboration with the mathematical community, ensuring its relevance for evaluating the next generation of LLMs.