Adaptive Channel Allocation for Robust Differentiable Architecture Search
This work addresses stability and robustness limitations in DARTS for practical applications in neural architecture search, representing an incremental improvement over prior methods.
The paper tackles the problem of skip connection accumulation causing instability in Differentiable Architecture Search (DARTS) by proposing an Adaptive Channel Allocation (ACA) strategy that implicitly searches and refills skip connections, achieving improved robustness and effectively addressing collapse issues in extensive experiments.
Differentiable ARchiTecture Search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency. However, the excessive accumulation of the skip connection, when training epochs become large, makes it suffer from weak stability and low robustness, thus limiting its practical applications. Many works have attempted to restrict the accumulation of skip connections by indicators or manual design. These methods, however, are susceptible to human priors and hyper-parameters. In this work, we suggest a more subtle and direct approach that no longer explicitly searches for skip connections in the search stage, based on the paradox that skip connections were proposed to guarantee the performance of very deep networks, but the networks in the search stage of differentiable architecture search are actually very shallow. Instead, by introducing channel importance ranking and channel allocation strategy, the skip connections are implicitly searched and automatically refilled unimportant channels in the evaluation stage. Our method, dubbed Adaptive Channel Allocation (ACA) strategy, is a general-purpose approach for differentiable architecture search, which universally works in DARTS variants without introducing human priors, indicators, or hyper-parameters. Extensive experiments on various datasets and DARTS variants verify that the ACA strategy is the most effective one among existing methods in improving robustness and dealing with the collapse issue when training epochs become large.