Improving Generalization by Permutation Routing Across Model Copies

arXiv:2605.0925633.6

AI Analysis

For machine learning practitioners, it offers a new way to improve generalization through structured message sharing across model copies.

The paper introduces a method that replicates a model M times and routes learning messages across copies via permutations from a mixing kernel Q, improving generalization without parameter averaging. The approach is demonstrated on perceptrons, committee machines, and MLPs.

We introduce a use of the \(M\)-cover (or \(M\)-layer) transform for machine learning. The method replicates a model \(M\) times, but instead of coupling the copies through parameter averaging or an explicit attractive force, as in replicated SGD or Elastic SGD, it rewires the contexts in which local learning messages are computed. Each local loss is evaluated on a routed model whose parameters are drawn from different copies according to permutations sampled from a structured mixing kernel \(Q\). Training then uses the original local update rule, while the resulting learning messages are redistributed across the copies through these routed computational paths. Thus \(Q\) defines a topology for message transport and controls the long-loop structure of the lifted factor graph. We formulate this construction for perceptrons, committee machines, and multilayer perceptrons, showing that the same principle applies from discrete models to differentiable neural networks. The resulting framework provides a mechanism for improving generalization through structured message sharing rather than replica collapse or parameter-space coupling.

View on arXiv PDF

Similar