MLSep 26, 2022
Learning Variational Models with Unrolling and Bilevel OptimizationChristoph Brauer, Niklas Breustedt, Timo de Wolff et al.
In this paper we consider the problem of learning variational models in the context of supervised learning via risk minimization. Our goal is to provide a deeper understanding of the two approaches of learning of variational models via bilevel optimization and via algorithm unrolling. The former considers the variational model as a lower level optimization problem below the risk minimization problem, while the latter replaces the lower level optimization problem by an algorithm that solves said problem approximately. Both approaches are used in practice, but unrolling is much simpler from a computational point of view. To analyze and compare the two approaches, we consider a simple toy model, and compute all risks and the respective estimators explicitly. We show that unrolling can be better than the bilevel optimization approach, but also that the performance of unrolling can depend significantly on further parameters, sometimes in unexpected ways: While the stepsize of the unrolled algorithm matters a lot (and learning the stepsize gives a significant improvement), the number of unrolled iterations plays a minor role.
CLSep 30, 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof GenerationJohannes Schmitt, Gergely Bérczi, Jasper Dekoninck et al.
As the mathematical capabilities of large language models (LLMs) improve, it becomes increasingly important to evaluate their performance on research-level tasks at the frontier of mathematical knowledge. However, existing benchmarks are limited, as they focus solely on final-answer questions or high-school competition problems. To address this gap, we introduce IMProofBench, a private benchmark consisting of 39 peer-reviewed problems developed by expert mathematicians. Each problem requires a detailed proof and is paired with subproblems that have final answers, supporting both an evaluation of mathematical reasoning capabilities by human experts and a large-scale quantitative analysis through automated grading. Furthermore, unlike prior benchmarks, the evaluation setup simulates a realistic research environment: models operate in an agentic framework with tools like web search for literature review and mathematical software such as SageMath. Our results show that current LLMs can succeed at the more accessible research-level questions, but still encounter significant difficulties on more challenging problems. Quantitatively, Grok-4 achieves the highest accuracy of 52% on final-answer subproblems, while GPT-5 obtains the best performance for proof generation, achieving a fully correct solution for 22% of problems. IMProofBench will continue to evolve as a dynamic benchmark in collaboration with the mathematical community, ensuring its relevance for evaluating the next generation of LLMs.
MGMar 31
Voronoi-Based Vacuum Leakage Detection in Composite ManufacturingChristoph Brauer, Arne Hindersmann, Timo de Wolff
In this article, we investigate vacuum leakage detection problems in composite manufacturing. Our approach uses Voronoi diagrams, a well-known structure in discrete geometry. The Voronoi diagram of the vacuum connection positions partitions the component surface. We use this partition to narrow down potential leak locations to a small area, making an efficient manual search feasible. To further reduce the search area, we propose refined Voronoi diagrams. We evaluate both variants using a novel dataset consisting of several hundred one- and two-leak positions along with their corresponding flow values. Our experimental results demonstrate that Voronoi-based predictive models are highly accurate and have the potential to resolve the leakage detection bottleneck in composite manufacturing.