Cluster-based Video Summarization with Temporal Context Awareness
This addresses the problem of generating concise video summaries for users, but it is incremental as it builds on existing cluster-based models.
The paper tackles video summarization by proposing TAC-SUM, a training-free method that incorporates temporal context into clustering, outperforming unsupervised methods and achieving performance comparable to supervised techniques on the SumMe dataset.
In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context. Our method partitions the input video into temporally consecutive segments with clustering information, enabling the injection of temporal awareness into the clustering process, setting it apart from prior cluster-based summarization methods. The resulting temporal-aware clusters are then utilized to compute the final summary, using simple rules for keyframe selection and frame importance scoring. Experimental results on the SumMe dataset demonstrate the effectiveness of our proposed approach, outperforming existing unsupervised methods and achieving comparable performance to state-of-the-art supervised summarization techniques. Our source code is available for reference at \url{https://github.com/hcmus-thesis-gulu/TAC-SUM}.