LGMLNov 6, 2023

Weight-Sharing Regularization

MILA
arXiv:2311.03096v21 citationsh-index: 47Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in neural network training for image processing, offering a novel regularization technique with potential applications in robust feature learning.

The authors tackled the problem of enabling fully connected networks to learn convolution-like filters in shuffled pixel settings by proposing a weight-sharing regularization penalty, achieving success where convolutional neural networks fail.

Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\operatorname{prox}_\mathcal{R}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on github.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes