LGMay 8

Learning Large-Scale Modular Addition with an Auxiliary Modulus

arXiv:2605.0764848.5
AI Analysis

For machine learning practitioners dealing with parity or modular addition tasks, this method enables scalable learning with reduced data requirements.

The paper addresses the challenge of learning modular addition with many summands and large moduli. By introducing an auxiliary modulus during training, the proposed method achieves high accuracy with small datasets, outperforming the previous sparse method (e.g., 97.0% vs 9.5% τ-accuracy at N=64, q=974269 with 100K samples).

Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirically analyzes this side effect and proposes a covariate-shift-free method for modular addition. Specifically, we introduce an auxiliary modulus $Kq$ during training, which reduces wrap-around frequency and problem difficulty while preserving the same input distribution across training and testing. Experiments show strong scalability and sample efficiency: even for large input length $N$, large modulus $q$, and small datasets -- where the sparse method fails to learn -- our method achieves equal or better match accuracy and relaxed $τ$-accuracy. For example, at $N=64$ and $q=974269$, our method trained on 100K samples achieves $97.0\%$ $τ$-accuracy at $τ=0.05$, while the sparse method achieves only $9.5\%$ with the same data size and $93.9\%$ even when extended to 1M samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes