LGAICVMay 29, 2025

REOrdering Patches Improves Vision Models

arXiv:2505.23751v32 citationsh-index: 10
Originality Highly original
AI Analysis

This addresses performance variability in vision transformers for computer vision tasks, offering a novel optimization method for patch ordering.

The paper tackles the problem of patch ordering sensitivity in vision transformers, showing that alternative orderings like column-major or Hilbert curves affect performance, and proposes REOrder, a framework that improves top-1 accuracy by up to 3.01% on ImageNet-1K and 13.35% on Functional Map of the World.

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes