CVGRLGAug 19, 2025

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

arXiv:2508.14187v12 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses scale variation issues in computer vision, which is crucial for tasks like object recognition, but it is incremental as it builds on existing network architectures.

The paper tackles the challenge of local scale variation in computer vision by introducing a deep equilibrium canonicalizer (DEC) that improves model performance and local scale consistency, achieving gains on the ImageNet benchmark across four pre-trained networks like ViT and Swin.

Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes