CLCVSep 24, 2018

Speaker Naming in Movies

arXiv:1809.08761v11090 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of identifying speakers in movies for applications in video analysis and question-answering, representing an incremental advance with strong domain-specific gains.

The authors tackled speaker naming in movies by proposing a multimodal model that integrates visual, textual, and acoustic data, achieving significant performance improvements over baselines on a new dataset and state-of-the-art results on the MovieQA 2017 Challenge.

We propose a new model for speaker naming in movies that leverages visual, textual, and acoustic modalities in an unified optimization framework. To evaluate the performance of our model, we introduce a new dataset consisting of six episodes of the Big Bang Theory TV show and eighteen full movies covering different genres. Our experiments show that our multimodal model significantly outperforms several competitive baselines on the average weighted F-score metric. To demonstrate the effectiveness of our framework, we design an end-to-end memory network model that leverages our speaker naming model and achieves state-of-the-art results on the subtitles task of the MovieQA 2017 Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes