CVOct 27, 2019

MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization

arXiv:1910.12295v11 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses video understanding for large-scale applications, but it is incremental as it builds on existing models like NeXtVLAD with a novel distillation structure.

The paper tackles video temporal concept localization by proposing a deep mixture model with online knowledge distillation (MOD), achieving 3rd place in the YouTube-8M challenge; it shows that fine-tuning with online distillation on a smaller dataset improves performance, with experiments indicating reduced overfitting and better generalization.

In this paper, we present and discuss a deep mixture model with online knowledge distillation (MOD) for large-scale video temporal concept localization, which is ranked 3rd in the 3rd YouTube-8M Video Understanding Challenge. Specifically, we find that by enabling knowledge sharing with online distillation, fintuning a mixture model on a smaller dataset can achieve better evaluation performance. Based on this observation, in our final solution, we trained and fintuned 12 NeXtVLAD models in parallel with a 2-layer online distillation structure. The experimental results show that the proposed distillation structure can effectively avoid overfitting and shows superior generalization performance. The code is publicly available at: https://github.com/linrongc/solution_youtube8m_v3

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes