A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
This work addresses the challenge of combining low-level and high-level visual cues for computer vision tasks, offering a joint solution that could benefit applications like video analysis and autonomous systems, though it appears incremental as it builds on existing multicut formulations.
The paper tackles the joint problem of motion segmentation and multi-target tracking by proposing a unified graphical model that integrates point trajectories and object detections, achieving results on the FBMS59 and 2D MOT 2015 benchmarks.
Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach.