Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN
This work addresses the problem of interpreting generalization and representation differences in DNNs for researchers in machine learning, offering an incremental extension to the lottery ticket hypothesis.
The paper tackles the problem of understanding generalization in deep neural networks by showing that the complexity of interactions encoded by a DNN explains its generalization power, and it discovers that confusing samples are determined by low-layer parameters, with two DNNs having different low-layer parameters typically having fully different sets of confusing samples despite similar performance.
In this paper, we find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power. We also discover that the confusing samples of a DNN, which are represented by non-generalizable interactions, are determined by its low-layer parameters. In comparison, other factors, such as high-layer parameters and network architecture, have much less impact on the composition of confusing samples. Two DNNs with different low-layer parameters usually have fully different sets of confusing samples, even though they have similar performance. This finding extends the understanding of the lottery ticket hypothesis, and well explains distinctive representation power of different DNNs.