CVSep 10, 2025

SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video

arXiv:2509.08828v13.6h-index: 2

Originality Highly original

AI Analysis

This addresses the challenge of realistic 3D reconstruction and rendering for fabrics in computer vision, offering a specific improvement over existing methods.

The paper tackles the problem of reconstructing 3D dynamic scenes for fabrics from monocular video, reducing 3D reconstruction error by a factor of 2.64 compared to recent methods and enabling appearance estimation with sharp details.

The reconstruction of three-dimensional dynamic scenes is a well-established yet challenging task within the domain of computer vision. In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for physically based rendering and present a system that is able to perform both tasks for fabrics, utilizing only a single monocular RGB video sequence as input. In order to obtain realistic and high-quality deformations and renderings, a physical simulation of the cloth geometry and differentiable rendering are employed. In this paper, we introduce two novel regularization terms for the 3D reconstruction task that improve the plausibility of the reconstruction by addressing the depth ambiguity problem in monocular video. In comparison with the most recent methods in the field, we have reduced the error in the 3D reconstruction by a factor of 2.64 while requiring a medium runtime of 30 min per scene. Furthermore, the optimized motion achieves sufficient quality to perform an appearance estimation of the deforming object, recovering sharp details from this single monocular RGB video.

View on arXiv PDF

Similar