CVMar 18

Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search

arXiv:2603.1671118.3h-index: 8
Predicted impact top 40% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the problem of precise object motion control in video generation for users needing editing tools without fine-tuning, though it is incremental as it builds on prior motion editing methods.

The paper tackles object-level motion editing in image-to-video generation by introducing Search2Motion, a training-free framework that uses target-frame-based control and attention-consensus search to achieve object relocation while preserving scene stability, outperforming baselines on metrics like FLF2V-obj and VBench.

We present Search2Motion, a training-free framework for object-level motion editing in image-to-video generation. Unlike prior methods requiring trajectories, bounding boxes, masks, or motion fields, Search2Motion adopts target-frame-based control, leveraging first-last-frame motion priors to realize object relocation while preserving scene stability without fine-tuning. Reliable target-frame construction is achieved through semantic-guided object insertion and robust background inpainting. We further show that early-step self-attention maps predict object and camera dynamics, offering interpretable user feedback and motivating ACE-Seed (Attention Consensus for Early-step Seed selection), a lightweight search strategy that improves motion fidelity without look-ahead sampling or external evaluators. Noting that existing benchmarks conflate object and camera motion, we introduce S2M-DAVIS and S2M-OMB for stable-camera, object-only evaluation, alongside FLF2V-obj metrics that isolate object artifacts without requiring ground-truth trajectories. Search2Motion consistently outperforms baselines on FLF2V-obj and VBench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes