Vision-based system identification and 3D keypoint discovery using dynamics constraints
This addresses the challenge of automating system modeling and keypoint tracking without manual labels, though it appears incremental as it builds on existing dynamics constraints and supervised learning methods.
The paper tackles the problem of simultaneously discovering 3D keypoints, identifying system dynamics, and calibrating cameras from unlabeled video using motion equations as weak supervision, achieving results across robotics, physics, and physiology settings.
This paper introduces V-SysId, a novel method that enables simultaneous keypoint discovery, 3D system identification, and extrinsic camera calibration from an unlabeled video taken from a static camera, using only the family of equations of motion of the object of interest as weak supervision. V-SysId takes keypoint trajectory proposals and alternates between maximum likelihood parameter estimation and extrinsic camera calibration, before applying a suitable selection criterion to identify the track of interest. This is then used to train a keypoint tracking model using supervised learning. Results on a range of settings (robotics, physics, physiology) highlight the utility of this approach.