I Dropped a Neural Net
This work addresses a theoretical puzzle in neural network interpretability and robustness, with potential implications for understanding training dynamics, though it is incremental as it builds on known stability concepts.
The paper tackles the problem of reconstructing the original layer order of a shuffled Residual Network by exploiting training stability conditions, specifically dynamic isometry, to pair and order layers, achieving exact recovery from a search space of approximately 10^122 possibilities.
A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order. Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections ($48!$ possibilities) and ordering the reassembled blocks ($48!$ possibilities), for a combined search space of $(48!)^2 \approx 10^{122}$, which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product $W_{\text{out}} W_{\text{in}}$ for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or $\|W_{\text{out}}\|_F$ then hill-climb to zero mean squared error.