Are All Linear Regions Created Equal?
This work addresses a foundational issue in understanding neural network complexity for researchers, showing that linear regions are not a reliable proxy and offering a more principled alternative.
The authors tackled the problem of whether linear region density accurately captures the nonlinearity of ReLU networks, finding that it fails in overparameterized settings, while their proposed variation-based measure correlates with reduced test error and deep double descent.
The number of linear regions has been studied as a proxy of complexity for ReLU networks. However, the empirical success of network compression techniques like pruning and knowledge distillation, suggest that in the overparameterized setting, linear regions density might fail to capture the effective nonlinearity. In this work, we propose an efficient algorithm for discovering linear regions and use it to investigate the effectiveness of density in capturing the nonlinearity of trained VGGs and ResNets on CIFAR-10 and CIFAR-100. We contrast the results with a more principled nonlinearity measure based on function variation, highlighting the shortcomings of linear regions density. Furthermore, interestingly, our measure of nonlinearity clearly correlates with model-wise deep double descent, connecting reduced test error with reduced nonlinearity, and increased local similarity of linear regions.