CVLGMay 12

WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views

arXiv:2605.1174315.1Has Code
Predicted impact top 93% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the need for efficient spatio-semantic reasoning in resource-constrained settings, offering a general framework for object localization tasks.

WorldComp2D introduces a lightweight representation learning framework that explicitly structures latent space by object identity and spatial proximity, achieving up to 4.0x fewer parameters and 2.2x fewer FLOPs than SoTA lightweight models on facial landmark localization while maintaining real-time CPU performance.

Learning latent representations that capture both semantic and spatial information is central to efficient spatio-semantic reasoning. However, many existing approaches rely on implicit latent structures combined with dense feature maps or task-specific heads, limiting computational efficiency and flexibility. We propose WorldComp2D, a novel lightweight representation learning framework that explicitly structures latent space geometry according to object identity and spatial proximity using multiscale local receptive fields. This framework consists of (i) a proximity-dependent encoder that maps a given observation into a spatio-semantic latent space and (ii) a localizer that infers the coordinates of objects in the input from the resulting spatio-semantic representation. Using facial landmark localization as a proof-of-concept, we show that, compared to SoTA lightweight models, WorldComp2D reduces the numbers of parameters and FLOPs by up to 4.0X and 2.2X, respectively, while maintaining real-time performance on CPU. These results demonstrate that explicitly structured latent spaces provide an efficient and general foundation for spatio-semantic reasoning. This framework is open-sourced at https://github.com/JinSeongmin/WorldComp2D.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes