CVJun 11, 2025

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota, Aiden Durrant

arXiv:2506.09895v13.6h-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more efficient and generalizable equivariant representations in computer vision, though it appears incremental as it builds on existing capsule network concepts.

The paper tackles the problem of learning pose-aware self-supervised representations without specialized predictors by introducing EquiCaps, a capsule-based approach that leverages intrinsic capsule capabilities. It achieves a supervised-level R² of 0.78 on rotation prediction, outperforming prior methods by 0.04-0.05 R² and maintaining robustness under combined transformations.

Learning self-supervised representations that are invariant and equivariant to transformations is crucial for advancing beyond traditional visual classification tasks. However, many methods rely on predictor architectures to encode equivariance, despite evidence that architectural choices, such as capsule networks, inherently excel at learning interpretable pose-aware representations. To explore this, we introduce EquiCaps (Equivariant Capsule Network), a capsule-based approach to pose-aware self-supervision that eliminates the need for a specialised predictor for enforcing equivariance. Instead, we leverage the intrinsic pose-awareness capabilities of capsules to improve performance in pose estimation tasks. To further challenge our assumptions, we increase task complexity via multi-geometric transformations to enable a more thorough evaluation of invariance and equivariance by introducing 3DIEBench-T, an extension of a 3D object-rendering benchmark dataset. Empirical results demonstrate that EquiCaps outperforms prior state-of-the-art equivariant methods on rotation prediction, achieving a supervised-level $R^2$ of 0.78 on the 3DIEBench rotation prediction benchmark and improving upon SIE and CapsIE by 0.05 and 0.04 $R^2$, respectively. Moreover, in contrast to non-capsule-based equivariant approaches, EquiCaps maintains robust equivariant performance under combined geometric transformations, underscoring its generalisation capabilities and the promise of predictor-free capsule architectures.

View on arXiv PDF Code

Similar