Hypersolid: Emergent Vision Representations via Short-Range Repulsion
This addresses representation collapse for self-supervised learning in vision, offering a novel approach but appears incremental as it builds on existing regularization methods.
The paper tackled the problem of representation collapse in self-supervised learning by reinterpreting it as a discrete packing problem and using short-range hard-ball repulsion to prevent local collisions, resulting in improved performance on fine-grained and low-resolution classification tasks.
A recurring challenge in self-supervised learning is preventing representation collapse. Existing solutions typically rely on global regularization, such as maximizing distances, decorrelating dimensions or enforcing certain distributions. We instead reinterpret representation learning as a discrete packing problem, where preserving information simplifies to maintaining injectivity. We operationalize this in Hypersolid, a method using short-range hard-ball repulsion to prevent local collisions. This constraint results in a high-separation geometric regime that preserves augmentation diversity, excelling on fine-grained and low-resolution classification tasks.