CVJul 26, 2025

TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection

arXiv:2507.19789v11 citationsh-index: 162025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Highly original
AI Analysis

This addresses the data scarcity issue in video salient object detection, enabling more effective training for motion-guided tasks.

The paper tackles the problem of limited training data for video salient object detection by transferring motion knowledge from pre-trained video diffusion models to generate realistic optical flows from static images, achieving improved performance across multiple benchmarks.

Video salient object detection (SOD) relies on motion cues to distinguish salient objects from backgrounds, but training such models is limited by scarce video datasets compared to abundant image datasets. Existing approaches that use spatial transformations to create video sequences from static images fail for motion-guided tasks, as these transformations produce unrealistic optical flows that lack semantic understanding of motion. We present TransFlow, which transfers motion knowledge from pre-trained video diffusion models to generate realistic training data for video SOD. Video diffusion models have learned rich semantic motion priors from large-scale video data, understanding how different objects naturally move in real scenes. TransFlow leverages this knowledge to generate semantically-aware optical flows from static images, where objects exhibit natural motion patterns while preserving spatial boundaries and temporal coherence. Our method achieves improved performance across multiple benchmarks, demonstrating effective motion knowledge transfer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes