A theoretical framework for deep locally connected ReLU network
This work addresses a fundamental problem in machine learning theory for researchers, but it is incremental as it builds upon existing teacher-student settings without demonstrating empirical gains.
The authors tackled the challenge of understanding theoretical properties of deep locally connected networks like DCNNs by proposing a novel theoretical framework with ReLU nonlinearity, which formulates data distribution and supports disentangled representations without unrealistic assumptions.
Understanding theoretical properties of deep and locally connected nonlinear network, such as deep convolutional neural network (DCNN), is still a hard problem despite its empirical success. In this paper, we propose a novel theoretical framework for such networks with ReLU nonlinearity. The framework explicitly formulates data distribution, favors disentangled representations and is compatible with common regularization techniques such as Batch Norm. The framework is built upon teacher-student setting, by expanding the student forward/backward propagation onto the teacher's computational graph. The resulting model does not impose unrealistic assumptions (e.g., Gaussian inputs, independence of activation, etc). Our framework could help facilitate theoretical analysis of many practical issues, e.g. overfitting, generalization, disentangled representations in deep networks.