CVMar 29, 2020

Disturbance-immune Weight Sharing for Neural Architecture Search

arXiv:2003.13089v131 citations
AI Analysis

This addresses a key bottleneck in NAS for researchers and practitioners by improving training stability and accuracy, though it is incremental as it builds on existing weight-sharing methods.

The paper tackles the performance disturbance issue in weight-sharing neural architecture search, where training subsequent architectures disturbs previous ones, and proposes a disturbance-immune update strategy using orthogonal gradient descent, achieving superior results on CIFAR-10 and ImageNet.

Neural architecture search (NAS) has gained increasing attention in the community of architecture design. One of the key factors behind the success lies in the training efficiency created by the weight sharing (WS) technique. However, WS-based NAS methods often suffer from a performance disturbance (PD) issue. That is, the training of subsequent architectures inevitably disturbs the performance of previously trained architectures due to the partially shared weights. This leads to inaccurate performance estimation for the previous architectures, which makes it hard to learn a good search strategy. To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating. Specifically, to preserve the knowledge learned by previous architectures, we constrain the training of subsequent architectures in an orthogonal space via orthogonal gradient descent. Equipped with this strategy, we propose a novel disturbance-immune training scheme for NAS. We theoretically analyze the effectiveness of our strategy in alleviating the PD risk. Extensive experiments on CIFAR-10 and ImageNet verify the superiority of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes