SDCVASJun 25, 2019

Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

arXiv:1906.10555v156 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of identifying speaking individuals in video data, which is incremental as it builds on existing methods for the AVA dataset.

The paper tackled the problem of active speaker detection in videos by using a 3D CNN front-end and an ensemble of temporal classifiers, resulting in significant improvements over the baseline on the AVA-ActiveSpeaker dataset.

This report describes our submission to the ActivityNet Challenge at CVPR 2019. We use a 3D convolutional neural network (CNN) based front-end and an ensemble of temporal convolution and LSTM classifiers to predict whether a visible person is speaking or not. Our results show significant improvements over the baseline on the AVA-ActiveSpeaker dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes