CVJul 22, 2025

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks

arXiv:2507.16279v14 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient training in deep learning for researchers and practitioners by offering a more memory-efficient alternative to backpropagation, though it is incremental in improving supervised local learning.

The paper tackles the performance degradation in supervised local learning by proposing MAN++, which introduces a dynamic interaction mechanism using EMA and a learnable scaling bias to enhance inter-block communication. The method achieves performance comparable to end-to-end training while significantly reducing GPU memory usage, as validated on image classification, object detection, and image segmentation tasks.

Deep learning typically relies on end-to-end backpropagation for training, a method that inherently suffers from issues such as update locking during parameter optimization, high GPU memory consumption, and a lack of biological plausibility. In contrast, supervised local learning seeks to mitigate these challenges by partitioning the network into multiple local blocks and designing independent auxiliary networks to update each block separately. However, because gradients are propagated solely within individual local blocks, performance degradation occurs, preventing supervised local learning from supplanting end-to-end backpropagation. To address these limitations and facilitate inter-block information flow, we propose the Momentum Auxiliary Network++ (MAN++). MAN++ introduces a dynamic interaction mechanism by employing the Exponential Moving Average (EMA) of parameters from adjacent blocks to enhance communication across the network. The auxiliary network, updated via EMA, effectively bridges the information gap between blocks. Notably, we observed that directly applying EMA parameters can be suboptimal due to feature discrepancies between local blocks. To resolve this issue, we introduce a learnable scaling bias that balances feature differences, thereby further improving performance. We validate MAN++ through extensive experiments on tasks that include image classification, object detection, and image segmentation, utilizing multiple network architectures. The experimental results demonstrate that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage. Consequently, MAN++ offers a novel perspective for supervised local learning and presents a viable alternative to conventional training methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes