OCMay 1
On the Distribution of Unweighted Minimum Knapsack Instances with Large SOS RankAdam Kurpisz, Lucas Slot, Mikhail Zaytsev
We analyze the sum-of-squares rank of unweighted instances of the Minimum Knapsack (MK) problem, i.e., minimization of $\sum_{i=1}^n x_i$ for 0/1 variables under the constraint $\sum_{i=1}^n x_i \geq q$, with $q \in \mathbb{R}$. Such instances have long served as a testbed for understanding the limitations of lift-and-project methods in Boolean optimization. For example, both the Lovász-Schrijver and Sherali-Adams hierarchies require (maximal) rank $n$ to solve them, already when $q=1/2$ is constant. The SOS hierarchy requires only \emph{sublinear} rank $O(\sqrt{n})$ to solve unweighted MK when $q=1/2$. On the other hand, when $q$ is allowed to vary with~$n$, the SOS rank of the problem may become linear. Interestingly, this is known to happen both when $q$ is large, and when $q$ is very small ($0<q \leq 2^{-n}$). This raises the question of whether we should think of hard instances of unweighted MK as being typical for the SOS hierarchy, or as a consequence of very specific choices of the threshold parameter $q$. In this paper, we address this question by showing new upper and lower bounds on the SOS rank of unweighted MK in the whole regime of the parameter $q$. For $n-q \leq O(1)$, we show that the SOS rank is constant. In contrast, when $q \leq O(1)$, a linear rank is needed if $q$ is exponentially close to an integer. As our main positive result, we show that linear rank is very rare for $q \leq O(1)$. This can be expressed in the language of smoothed analysis: after perturbing $q$ by a Gaussian with mean $0$ and variance $σ^2$, the expected SOS rank of MK is $O(\sqrt{n} \log (n/σ))$.
CLSep 30, 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof GenerationJohannes Schmitt, Gergely Bérczi, Jasper Dekoninck et al.
As the mathematical capabilities of large language models (LLMs) improve, it becomes increasingly important to evaluate their performance on research-level tasks at the frontier of mathematical knowledge. However, existing benchmarks are limited, as they focus solely on final-answer questions or high-school competition problems. To address this gap, we introduce IMProofBench, a private benchmark consisting of 39 peer-reviewed problems developed by expert mathematicians. Each problem requires a detailed proof and is paired with subproblems that have final answers, supporting both an evaluation of mathematical reasoning capabilities by human experts and a large-scale quantitative analysis through automated grading. Furthermore, unlike prior benchmarks, the evaluation setup simulates a realistic research environment: models operate in an agentic framework with tools like web search for literature review and mathematical software such as SageMath. Our results show that current LLMs can succeed at the more accessible research-level questions, but still encounter significant difficulties on more challenging problems. Quantitatively, Grok-4 achieves the highest accuracy of 52% on final-answer subproblems, while GPT-5 obtains the best performance for proof generation, achieving a fully correct solution for 22% of problems. IMProofBench will continue to evolve as a dynamic benchmark in collaboration with the mathematical community, ensuring its relevance for evaluating the next generation of LLMs.
MLJun 24, 2024
A Wiener Process Perspective on Local Intrinsic Dimension Estimation MethodsPiotr Tempczyk, Łukasz Garncarek, Dominik Filipiak et al.
Local intrinsic dimension (LID) estimation methods have received a lot of attention in recent years thanks to the progress in deep neural networks and generative modeling. In opposition to old non-parametric methods, new methods use generative models to approximate diffused dataset density to scale the methods to high-dimensional datasets (e.g. images). In this paper, we investigate the recent state-of-the-art parametric LID estimation methods from the perspective of the Wiener process. We explore how these methods behave when their assumptions are not met. We give an extended mathematical description of those methods and their error as a function of the probability density of the data.