LG AI CVNov 24, 2025

Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers

Rowan Bradbury, Aniket Srinivasan Ashok, Sai Ram Kasanagottu, Gunmay Jhingran, Shuai Meng

arXiv:2511.18670v1

Originality Incremental advance

AI Analysis

This addresses a stability challenge for researchers and practitioners working on efficient transformer modifications, though it appears incremental as it builds on existing replacement methods.

The paper tackled the problem of destabilizing pretrained transformers when replacing modules like self-attention with efficient alternatives, and introduced Deterministic Continuous Replacement (DCR) to blend teacher and student outputs with a deterministic weight, resulting in faster convergence and stronger alignment in a single-seed study.

Replacing modules in pretrained models, especially swapping quadratic self-attention for efficient attention alternatives, poses a hard optimization problem: cold-start reinitialization destabilizes frozen backbones. We isolate this core stability challenge in a controlled study. Deterministic Continuous Replacement (DCR) blends teacher and student outputs with a deterministic, annealed weight. Theoretically, DCR eliminates gate-induced gradient variance inherent to stochastic replacement. In a single-seed study, DCR attains faster convergence and stronger alignment than stochastic gating and distillation baselines on controlled attention replacement, establishing a foundation for heterogeneous operator swaps.

View on arXiv PDF

Similar