Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
This work addresses a computational bottleneck for researchers and practitioners using self-supervised learning, offering a more efficient training method that is incremental in improving existing models.
The paper tackles the computational inefficiency of decorrelating regularizers in self-supervised learning models like Barlow Twins and VICReg, which scale poorly with high embedding dimensions, by proposing a relaxed regularizer that uses Fast Fourier Transform to reduce training time from O(n d^2) to O(n d log d) while maintaining comparable accuracy in downstream tasks.
Barlow Twins and VICReg are self-supervised representation learning models that use regularizers to decorrelate features. Although these models are as effective as conventional representation learning models, their training can be computationally demanding if the dimension d of the projected embeddings is high. As the regularizers are defined in terms of individual elements of a cross-correlation or covariance matrix, computing the loss for n samples takes O(n d^2) time. In this paper, we propose a relaxed decorrelating regularizer that can be computed in O(n d log d) time by Fast Fourier Transform. We also propose an inexpensive technique to mitigate undesirable local minima that develop with the relaxation. The proposed regularizer exhibits accuracy comparable to that of existing regularizers in downstream tasks, whereas their training requires less memory and is faster for large d. The source code is available.