LGMay 29
Spectral Reach: Understanding Neural Scaling as Progress into the Spectral TailKonstantin Nikolaou, Jonas Scheunemann, Sven Krippendorf et al.
Neural scaling laws describe predictable power-law relationships between model size, dataset size, compute, and performance. While these laws guide the development of modern foundation models, the mechanisms underpinning them remain poorly understood, in part due to the absence of scalable analysis tools. To close this gap, we introduce "spectral position": a scalable measure of which eigenvalues of the empirical neural tangent kernel (eNTK) currently drive loss reduction. Applying this measure to scaling experiments, we find that spectral position decreases throughout training: learning shifts from dominant eigenmodes into the spectral tail. Larger models reach further into the tail than smaller models, revealing a size-dependent capacity we call "spectral reach". This suggests why larger models achieve lower losses: they sustain learning on weak spectral signals inaccessible to smaller models. We further identify feature learning as a key enabler of spectral reach. It adaptively amplifies gradient magnitudes as learning advances, sustaining progress where frozen representations stall. This points to concrete interventions through architecture and optimizer design.
AIJul 9, 2025Code
Open Source Planning & Control System with Language Agents for Autonomous Scientific DiscoveryLicong Xu, Milind Sarkar, Anto I. Lonappan et al.
We present a multi-agent system for automation of scientific research tasks, cmbagent (https://github.com/CMBAgents/cmbagent). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.
HEP-THMar 30
Physics as Code: From Scans to Theorems with ITP APIs in $SU(5)$ Model BuildingSven Krippendorf, Joseph Tooby-Smith
A recurring challenge in theoretical physics is to make reliable global statements about bounded but combinatorially large model spaces. Exhaustive scans quickly become opaque or impractical, while statistical exploration does not by itself provide theorem-backed guarantees. This motivates workflows in which the model-building problem itself is formalized inside an interactive theorem prover (ITP). In this paper we develop an API-based methodology for formalizing such bounded model-building questions inside Lean, an interactive theorem prover. The central step is to represent the relevant charge spectra, predicates, and reduction moves as reusable ITP definitions, and then to derive the classification from proved reduction theorems rather than from an ad hoc scan. We demonstrate the strategy in a concrete $SU(5)$ case study motivated by F-theory model building with additional Abelian symmetries. At the charge-spectrum layer, we classify bounded spectra that admit a top-quark Yukawa coupling, avoid a selected set of dangerous operators, and satisfy a minimal charge-spectrum completeness condition. Our main result shows that every such spectrum in the bounded search space arises from finitely many minimal top-Yukawa witnesses together with controlled completions and certified closure steps. This classification represents a formally verified description of the full viable class in the charge-spectrum setting studied here. The development is implemented inside PhysLib as reusable infrastructure rather than as a one-off verification script. It provides a proof of principle for how interactive theorem provers can turn combinatorially difficult model-building problems into correctness-first, reusable workflows, and we discuss how the resulting certified classification can serve as reliable input for downstream analyses.
LGJul 7, 2025
Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK LensKonstantin Nikolaou, Sven Krippendorf, Samuel Tovey et al.
Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK). This analysis establishes a link between performance scaling and the internal dynamics of neural networks. Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior. This demonstrates that performance scaling alone is insufficient for understanding the underlying mechanisms of neural networks. We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models. To this end, we investigate how feature learning is lost as the model width increases and quantify the transition between kernel-driven and feature-driven scaling regimes. We identify the maximum model width that supports feature learning, which, in our setups, we find to be more than ten times smaller than typical large language model widths.
LGMay 1, 2023
Towards a Phenomenological Understanding of Neural Networks: DataSamuel Tovey, Sven Krippendorf, Konstantin Nikolaou et al.
A theory of neural networks (NNs) built upon collective variables would provide scientists with the tools to better understand the learning process at every stage. In this work, we introduce two such variables, the entropy and the trace of the empirical neural tangent kernel (NTK) built on the training data passed to the model. We empirically analyze the NN performance in the context of these variables and find that there exists correlation between the starting entropy, the trace of the NTK, and the generalization of the model computed after training is complete. This framework is then applied to the problem of optimal data selection for the training of NNs. To this end, random network distillation (RND) is used as a means of selecting training data which is then compared with random selection of data. It is shown that not only does RND select data-sets capable of outperforming random selection, but that the collective variables associated with the RND data-sets are larger than those of the randomly selected sets. The results of this investigation provide a stable ground from which the selection of data for NN training can be driven by this phenomenological framework.
GR-QCFeb 22, 2022
A duality connecting neural network and cosmological dynamicsSven Krippendorf, Michael Spannowsky
We demonstrate that the dynamics of neural networks trained with gradient descent and the dynamics of scalar fields in a flat, vacuum energy dominated Universe are structurally profoundly related. This duality provides the framework for synergies between these systems, to understand and explain neural network dynamics and new ways of simulating and describing early Universe models. Working in the continuous-time limit of neural networks, we analytically match the dynamics of the mean background and the dynamics of small perturbations around the mean field, highlighting potential differences in separate limits. We perform empirical tests of this analytic description and quantitatively show the dependence of the effective field theory parameters on hyperparameters of the neural network. As a result of this duality, the cosmological constant is matched inversely to the learning rate in the gradient descent update.
LGApr 29, 2021
Improving Simulations with Symmetry Control Neural NetworksMarc Syvaeri, Sven Krippendorf
The dynamics of physical systems is often constrained to lower dimensional sub-spaces due to the presence of conserved quantities. Here we propose a method to learn and exploit such symmetry constraints building upon Hamiltonian Neural Networks. By enforcing cyclic coordinates with appropriate loss functions, we find that we can achieve improved accuracy on simple classical dynamics tasks. By fitting analytic formulae to the latent variables in our network we recover that our networks are utilizing conserved quantities such as (angular) momentum.
COMP-PHMar 30, 2020
Detecting Symmetries with Neural NetworksSven Krippendorf, Marc Syvaeri
Identifying symmetries in data sets is generally difficult, but knowledge about them is crucial for efficient data handling. Here we present a method how neural networks can be used to identify symmetries. We make extensive use of the structure in the embedding layer of the neural network which allows us to identify whether a symmetry is present and to identify orbits of the symmetry in the input. To determine which continuous or discrete symmetry group is present we analyse the invariant orbits in the input. We present examples based on rotation groups $SO(n)$ and the unitary group $SU(2).$ Further we find that this method is useful for the classification of complete intersection Calabi-Yau manifolds where it is crucial to identify discrete symmetries on the input space. For this example we present a novel data representation in terms of graphs.
COMP-PHFeb 12, 2020
Connecting Dualities and Machine LearningPhilip Betzler, Sven Krippendorf
Dualities are widely used in quantum field theories and string theory to obtain correlation functions at high accuracy. Here we present examples where dual data representations are useful in supervised classification, linking machine learning and typical tasks in theoretical physics. We then discuss how such beneficial representations can be enforced in the latent dimension of neural networks. We find that additional contributions to the loss based on feature separation, feature matching with respect to desired representations, and a good performance on a `simple' correlation function can lead to known and unknown dual representations. This is the first proof of concept that computers can find dualities. We discuss how our examples, based on discrete Fourier transformation and Ising models, connect to other dualities in theoretical physics, for instance Seiberg duality.
LGSep 6, 2018
GANs for generating EFT modelsHarold Erbin, Sven Krippendorf
We initiate a way of generating models by the computer, satisfying both experimental and theoretical constraints. In particular, we present a framework which allows the generation of effective field theories. We use Generative Adversarial Networks to generate these models and we generate examples which go beyond the examples known to the machine. As a starting point, we apply this idea to the generation of supersymmetric field theories. In this case, the machine knows consistent examples of supersymmetric field theories with a single field and generates new examples of such theories. In the generated potentials we find distinct properties, here the number of minima in the scalar potential, with values not found in the training data. We comment on potential further applications of this framework.