CV GROct 14, 2024

4-LEGS: 4D Language Embedded Gaussian Splatting

Gal Fiebelman, Tamir Cohen, Ayellet Morgenstern, Peter Hedman, Hadar Averbuch-Elor

arXiv:2410.10719v313.511 citationsh-index: 18Computer graphics forum (Print)

Originality Incremental advance

AI Analysis

This work addresses the challenge of semantic understanding in dynamic 3D scenes for applications like video analysis and interactive systems, representing an incremental advancement by extending existing 3D methods to 4D with language integration.

The paper tackles the problem of connecting language with dynamic 3D scene modeling by lifting spatio-temporal features to a 4D representation based on 3D Gaussian Splatting, enabling interactive spatiotemporal localization of events from text prompts in videos of people and animals.

The emergence of neural representations has revolutionized our means for digitally viewing a wide range of 3D scenes, enabling the synthesis of photorealistic images rendered from novel views. Recently, several techniques have been proposed for connecting these low-level representations with the high-level semantics understanding embodied within the scene. These methods elevate the rich semantic understanding from 2D imagery to 3D representations, distilling high-dimensional spatial features onto 3D space. In our work, we are interested in connecting language with a dynamic modeling of the world. We show how to lift spatio-temporal features to a 4D representation based on 3D Gaussian Splatting. This enables an interactive interface where the user can spatiotemporally localize events in the video from text prompts. We demonstrate our system on public 3D video datasets of people and animals performing various actions.

View on arXiv PDF

Similar