Online Adaptation for Consistent Mesh Reconstruction in the Wild
This work addresses the challenging problem of reconstructing temporally consistent 3D meshes of deformable objects from unconstrained videos for computer vision researchers, offering an incremental improvement in handling 'in the wild' scenarios.
This paper proposes a self-supervised online adaptation algorithm to reconstruct temporally consistent 3D meshes of deformable objects from videos in the wild. It learns a category-specific 3D reconstruction model from single-view images and then adapts it to test videos using temporal consistency regularization, enabling the recovery of reliable 3D structures from challenging non-rigid object videos.
This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild. Without requiring annotations of 3D mesh, 2D keypoints, or camera pose for each video frame, we pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video. We first learn a category-specific 3D reconstruction model from a collection of single-view images of the same category that jointly predicts the shape, texture, and camera pose of an image. Then, at inference time, we adapt the model to a test video over time using self-supervised regularization terms that exploit temporal consistency of an object instance to enforce that all reconstructed meshes share a common texture map, a base shape, as well as parts. We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild -- an extremely challenging task rarely addressed before.