Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment
This addresses the need for efficient video summarization for social media platforms like YouTube and TikTok, though it is incremental as it builds on existing graph-based methods with a novel optimization approach.
The paper tackles the problem of summarizing user-generated videos into keyframes efficiently by proposing a fast graph sampling algorithm based on Gershgorin disc alignment, achieving comparable or better performance than state-of-the-art methods with substantially reduced complexity.
User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\mathcal{G}^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" $\mathcal{G}^o$ to a $1$-hop path graph $\mathcal{G}$, specified by a generalized graph Laplacian matrix $\mathcal{L}$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $λ_{\min}(\mathbf{B})$ of a coefficient matrix $\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + μ\mathcal{L}$, where $\mathbf{h}$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $λ^-_{\min}(\mathbf{B})$ by choosing $\mathbf{h}$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.