MatAnyone: Stable Video Matting with Consistent Memory Propagation
This work addresses video matting for applications like video editing by improving stability and accuracy, though it appears incremental as it builds on memory-based paradigms.
The paper tackles the problem of auxiliary-free human video matting struggling with complex backgrounds by proposing MatAnyone, a framework that introduces a consistent memory propagation module and a new dataset and training strategy, resulting in robust and accurate matting that outperforms existing methods.
Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.