CVAICLLGApr 5, 2024

Physical Property Understanding from Language-Embedded Feature Fields

arXiv:2404.04242v144 citationsh-index: 4CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling computers to perceive physical properties from vision alone, which is incremental by combining language models with 3D point clouds for open-world applicability.

The paper tackles the problem of dense prediction of physical properties of objects from images by leveraging large language models to propose candidate materials and using a zero-shot kernel regression approach on a language-embedded point cloud, achieving accurate results in tasks like estimating mass, friction, and hardness without annotations.

Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes