ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers
This work addresses the challenge of interpretability and generalization in neural networks, with potential applications beyond cryptograms, though it is incremental in its architectural innovations.
The paper tackled the problem of neural network reasoning and generalization by developing ALICE, a Transformer-based model for solving substitution ciphers, which achieved state-of-the-art accuracy and speed and generalized to unseen ciphers after training on only about 1500 unique ciphers.
We present cryptogram solving as an ideal testbed for studying neural network reasoning and generalization; models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher. We develop ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment), a simple encoder-only Transformer that sets a new state-of-the-art for both accuracy and speed on this decryption problem. Surprisingly, ALICE generalizes to unseen ciphers after training on only ${\sim}1500$ unique ciphers, a minute fraction ($3.7 \times 10^{-24}$) of the possible cipher space. To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Through early exit and probing experiments, we reveal how ALICE progressively refines its predictions in a way that appears to mirror common human strategies -- early layers place greater emphasis on letter frequencies, while later layers form word-level structures. Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability.