77.2AIJun 2
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph TheoryNoujoud Nader, Ibrahem Aljabea, Patrick Diehl et al.
Large language models (LLMs) are increasingly used as self-study assistants in technical disciplines, yet their reliability as mathematical reasoning assistants remains poorly understood. We introduce GTBench, a curriculum-grounded benchmark for evaluating LLMs as mathematical research assistants in graph theory, comprising 63 problems organized into three groups of increasing difficulty: undergraduate definitions and basic properties (Group 1), algorithm tracing and structural reasoning (Group 2), and graduate-level proof construction (Group 3). Problems are sourced from verified academic materials including Diestel's Graph Theory. We evaluate five frontier models -- GPT-5, Claude Sonnet 4.6, Gemini 2.5 Flash-Lite, Llama 3.3 70B, and Mistral Large 3 -- under zero-shot and chain-of-thought prompting, using exact-match and LLM-as-judge evaluation for Groups 1 and 2, and a hybrid human expert and LLM-as-judge protocol for Group 3. Our results reveal a pronounced performance hierarchy: GPT-5 approaches ceiling on Group 1 (95.8% zero-shot) and maintains meaningful accuracy on graduate proofs (82%), while all other models degrade substantially with difficulty, with Llama achieving 0% under human evaluation on Group 3 zero-shot. Failure mode analysis shows that correct algorithm, wrong execution errors dominate Groups 1 and 2, while Group 3 additionally surfaces incomplete reasoning failures and reveals systematic disagreement between human evaluators and the automated judge, particularly on verbose or near-complete proofs (kappa = 0.48-0.83 across human pairs). GTBench provides the first curriculum-grounded evaluation framework for graph-theoretic reasoning in LLMs, with direct implications for the governance of AI tools in mathematical education and scientific research.
49.9NAJun 1
Coordinate-wise splitting algorithms for ODE simulation via Koopman-Lie product formulasArun Banjara, Ibrahem AlJabea, Theodore Papamarkou et al.
We present a computational framework for simulating finite-dimensional ordinary differential equations by combining classical Koopman-Lie product formulas with coordinate-wise frozen subflows. The setting is model-known, since the vector field is assumed to be available, and no data-driven approximation of the Koopman operator is attempted. Under standard assumptions, the Koopman-Lie generator associated with the flow admits a coordinate decomposition into partial generators. This decomposition leads to elementary updates in which all but one state variable are frozen, and the resulting frozen scalar subproblems are evaluated either in closed form or by one-dimensional solves. Lie-Trotter, Strang, and higher-order exponential compositions are then converted into state-update algorithms for two- and three-dimensional systems, with the semigroup and product-formula theory used as background justification for the constructions. We also record the exponential-term counts produced by the recursive constructions used in the implementation. These counts are presented as implementation costs. Numerical experiments on the Lotka-Volterra, Van der Pol, and Lorenz systems compare the coordinate-wise splitting algorithms with high-accuracy RK45 reference solutions using root-mean-square errors and work-precision curves. The results illustrate the practical trade-off between splitting order, number of time steps, number of exponential factors, and runtime.
30.6LGMay 11
TopoU-Net: a U-Net architecture for topological domainsGaurav Gaurav, Ibrahem ALJabea, Yaroslav Zakomornyy et al.
Many modern datasets mix points, edges, regions, groups, objects, events, hyperedges, and relations. Yet neural architectures often force such data into grids, graphs, or sequences, obscuring higher-order structure and making encoder-decoder designs domain-specific. We view U-Net not as a grid-specific architecture, but as a hierarchical encoder-decoder principle: representation spaces, transport maps between levels, and skip connections between matched levels. Combinatorial complexes naturally supply these ingredients through cells, incidences, and ranks. We introduce TopoU-Net, a rank-path U-Net for topological domains. Given a path from an input rank to a bottleneck rank and back, the encoder lifts cochains upward along incidence maps, the decoder transports them downward, and skip connections merge features at matched ranks. Rank replaces spatial scale: choosing paths through nodes, edges, faces, hyperedges, or global cells becomes the central architectural decision. A key quantity is the bottleneck support ratio, the number of cells at the bottleneck relative to the number of cells at the input rank. This ratio is fixed by the complex and chosen path rather than by arbitrary pooling, and it clarifies when skip connections are optional, useful, or structurally important. Across node classification, graph classification, hypergraph node classification, mesh classification, and image reconstruction, TopoU-Net provides a reusable encoder-decoder template for higher-order structured data. Among the evaluated baselines, it achieves the strongest mean accuracy on six of eight node-classification datasets and four of five hypergraph datasets, with the largest gains on heterophilic graphs. Ablations show that removing skip connections is most damaging under severe bottleneck compression.
LGFeb 4, 2024
TopoX: A Suite of Python Packages for Machine Learning on Topological DomainsMustafa Hajij, Mathilde Papillon, Florian Frantzen et al.
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://pyt-team.github.io/}{https://pyt-team.github.io/.