Emergence of Computational Structure in a Neural Network Physics Simulator
This provides insights into convergence times and detection methods for computational structure emergence in neural networks, though it is incremental as it builds on existing work on interpretability and physics simulation.
The paper tackled the problem of understanding how identifiable computational structures emerge in neural networks, specifically in a transformer-like model simulating particle physics, and found that these structures emerge in attention heads to detect collisions, associated with degenerate loss geometry and following a power law.
Neural networks often have identifiable computational structures - components of the network which perform an interpretable algorithm or task - but the mechanisms by which these emerge and the best methods for detecting these structures are not well understood. In this paper we investigate the emergence of computational structure in a transformer-like model trained to simulate the physics of a particle system, where the transformer's attention mechanism is used to transfer information between particles. We show that (a) structures emerge in the attention heads of the transformer which learn to detect particle collisions, (b) the emergence of these structures is associated to degenerate geometry in the loss landscape, and (c) the dynamics of this emergence follows a power law. This suggests that these components are governed by a degenerate "effective potential". These results have implications for the convergence time of computational structure within neural networks and suggest that the emergence of computational structure can be detected by studying the dynamics of network components.