CVLGMMJan 12, 2020

Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

arXiv:2001.05864v258 citations
AI Analysis

This work addresses video summarization for efficient content analysis, offering a weakly supervised approach that reduces labeling effort while improving results, though it is incremental in its method.

The paper tackles the sparse reward and high labeling cost problems in video summarization by proposing a weakly supervised hierarchical reinforcement learning framework, achieving state-of-the-art performance on two benchmark datasets.

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each frame is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes