LGJan 25, 2023

When Layers Play the Lottery, all Tickets Win at Initialization

Artur Jordao, George Correa de Araujo, Helena de Almeida Maia, Helio Pedrini

arXiv:2301.10835v26.64 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses computational efficiency and environmental impact for deep learning practitioners, offering a novel approach to pruning that is incremental over existing methods.

The paper tackles the problem of reducing computational cost in deep networks by showing that winning tickets (sparse subnetworks) can be found through layer pruning at initialization, eliminating the need to train dense networks. The result includes up to 51% reduction in carbon emissions and improved robustness against adversarial and out-of-distribution examples.

Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.

View on arXiv PDF Code

Similar