LGJan 29

Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

arXiv:2601.21624v12 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the need for standardized measurement of training memory effects in deep learning, which is incremental as it builds on existing mechanisms but introduces new tools and frameworks.

The paper tackles the problem of understanding and measuring the impact of training memory in deep neural networks, proposing a protocol for causal, uncertainty-aware measurement to attribute the significance of training history across various models, data, and regimes.

Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state (teacher EMA/SWA, contrastive queues, BatchNorm statistics). This survey organizes mechanisms by source, lifetime, and visibility. It introduces seed-paired, function-space causal estimands; portable perturbation primitives (carry/reset of momentum/Adam/EMA/BN, order-window swaps, queue/teacher tweaks); and a reporting checklist with audit artifacts (order hashes, buffer/BN checksums, RNG contracts). The conclusion is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data, and regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes