HOJun 4
Benchmarks in LeipzigAndrei Balakin, Miklós Bóna, Marie-Charlotte Brandenburg et al.
Between April 1 and May 15, 2026, a group of 49 mathematicians compiled a dataset of research-level mathematics questions with known answers. Most of the work was done during the 3-day workshop *Benchmarks in Leipzig* with 35 participants at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany. We present the resulting collection of 100 questions. We evaluated these questions in three stages: a single attempt by five state-of-the-art LLMs, followed by a 20-runs-per-model evaluation with three of these models, and finally a 3-run attempt with two heavy-thinking models. After Stage 1, 41 questions remained completely unsolved; after Stage 2, this count dropped to 16; and we concluded Stage 3 with only 2 unsolved questions. This demonstrates that the mathematical reasoning capabilities of LLMs are becoming impressive.
LGJan 24, 2025
Humanity's Last ExamLong Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
LGJul 8, 2021
Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension EstimationAlexander Ivanov, Gleb Nosovskiy, Alexey Chekunov et al.
Manifold hypothesis states that data points in high-dimensional space actually lie in close vicinity of a manifold of much lower dimension. In many cases this hypothesis was empirically verified and used to enhance unsupervised and semi-supervised learning. Here we present new approach to manifold hypothesis checking and underlying manifold dimension estimation. In order to do it we use two very different methods simultaneously - one geometric, another probabilistic - and check whether they give the same result. Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation. The probabilistic method is new. Although it exploits standard nearest neighborhood distance, it is different from methods which were previously used in such situations. This method is robust, fast and includes special preliminary data transformation. Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
ROMar 6, 2018
Secure Minimum Time Planning Under Environmental Uncertainty: an Extended TreatmentAlexander Ivanov, Mark Campbell
Cyber Physical Systems (CPS) are becoming ubiquitous and affect the physical world, yet security is seldom at the forefront of their design. This is especially true of robotic control algorithms which seldom consider the effect of a cyber attack on mission objectives and success. This work presents a secure optimal control algorithm in the face of a cyber attack on a robot's knowledge of the environment. This work focuses on cyber attack, but the results generalize to incomplete or outdated information of an environment. This work fuses ideas from robust control, optimal control, and sensor based planning to provide a generalization of stopping distance in 3D. The planner is implemented in simulation and its properties are analyzed.
NCJun 19, 2017
Evaluating 35 Methods to Generate Structural Connectomes Using Pairwise ClassificationDmitry Petrov, Alexander Ivanov, Joshua Faskowitz et al.
There is no consensus on how to construct structural brain networks from diffusion MRI. How variations in pre-processing steps affect network reliability and its ability to distinguish subjects remains opaque. In this work, we address this issue by comparing 35 structural connectome-building pipelines. We vary diffusion reconstruction models, tractography algorithms and parcellations. Next, we classify structural connectome pairs as either belonging to the same individual or not. Connectome weights and eight topological derivative measures form our feature set. For experiments, we use three test-retest datasets from the Consortium for Reliability and Reproducibility (CoRR) comprised of a total of 105 individuals. We also compare pairwise classification results to a commonly used parametric test-retest measure, Intraclass Correlation Coefficient (ICC).
ROMar 3, 2017
An Extended Consideration of Joint Exploration and Tracking: JETAlexander Ivanov, Mark Campbell
Autonomous exploration and multi-object tracking by a team of agents have traditionally been considered as two separate, yet related, problems which are usually solved in two phases: an exploration phase then a tracking phase. The exploration problem is usually viewed through an information theoretic framework where a robotic agent attempts to gather as much information about the environment or an Object of Interest (OI). Conversely, the tracking problem attempts to maintain precise location information about an OI over time. This work proposes a single framework which enables the multi-robot multi-object problem to be solved simultaneously. A hierarchical architecture is used to coordinate robotic agents in the tracking of multiple OIs while simultaneously allowing the task to remain computationally efficient. The primary contributions of this work are a probabilistic constraint on the tracked OIs' covariances guarantees tracking performance throughout the entire mission. The automatic discovery of new OIs, a seamless transition to guaranteed tracking of discovered OIs, and the automatic balancing of exploration with the requirements of tracking.
NCJan 26, 2017
Structural Connectome Validation Using Pairwise ClassificationDmitry Petrov, Boris Gutman, Alexander Ivanov et al.
In this work, we study the extent to which structural connectomes and topological derivative measures are unique to individual changes within human brains. To do so, we classify structural connectome pairs from two large longitudinal datasets as either belonging to the same individual or not. Our data is comprised of 227 individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and 226 from the Parkinson's Progression Markers Initiative (PPMI). We achieve 0.99 area under the ROC curve score for features which represent either weights or network structure of the connectomes (node degrees, PageRank and local efficiency). Our approach may be useful for eliminating noisy features as a preprocessing step in brain aging studies and early diagnosis classification problems.
RODec 5, 2016
An Extended Treatment of Uncertainty Constrained robotic Exploration: An Integrated Exploration PlannerAlexander Ivanov, Mark Campbell
Efficient robotic exploration of unknown, sensor limited, global-information-deficient environments poses unique challenges to path planning algorithms. In these difficult environments, no deterministic guarantees on path completion and mission success can be made in general. Integrated Exploration (IE), which strives to combine localization and exploration, must be solved in order to create an autonomous robotic system capable of long term operation in new and challenging environments. This paper formulates a probabilistic framework which allows the creation of exploration algorithms providing probabilistic guarantees of success. A novel connection is made between the Hamiltonian Path Problem and exploration. The Guaranteed Probabilistic Information Explorer (G-PIE) is developed for the IE problem, providing a probabilistic guarantee on path completion, and asymptotic optimality of exploration. A receding horizon formulation, dubbed RH-PIE, is presented which addresses the exponential complexity present in G-PIE. Finally, RH-PIE planner is verified via autonomous, hardware-in-the-loop experiments.