MMAICVLGJul 5, 2024

Reinforcement Learning for Unsupervised Video Summarization with Reward Generator Training

arXiv:2407.04258v21 citationsh-index: 26
Originality Highly original
AI Analysis

It addresses video summarization for applications like content analysis, offering a more stable alternative to adversarial methods.

This paper tackled unsupervised video summarization by using reinforcement learning with a reward generator trained for reconstruction fidelity, achieving strong alignment with human judgments and promising F-scores.

This paper presents a novel approach for unsupervised video summarization using reinforcement learning (RL), addressing limitations like unstable adversarial training and reliance on heuristic-based reward functions. The method operates on the principle that reconstruction fidelity serves as a proxy for informativeness, correlating summary quality with reconstruction ability. The summarizer model assigns importance scores to frames to generate the final summary. For training, RL is coupled with a unique reward generation pipeline that incentivizes improved reconstructions. This pipeline uses a generator model to reconstruct the full video from the selected summary frames; the similarity between the original and reconstructed video provides the reward signal. The generator itself is pre-trained self-supervisedly to reconstruct randomly masked frames. This two-stage training process enhances stability compared to adversarial architectures. Experimental results show strong alignment with human judgments and promising F-scores, validating the reconstruction objective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes