SD CV ASJun 25, 2019

Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

arXiv:1906.10555v156 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of identifying speaking individuals in video data, which is incremental as it builds on existing methods for the AVA dataset.

The paper tackled the problem of active speaker detection in videos by using a 3D CNN front-end and an ensemble of temporal classifiers, resulting in significant improvements over the baseline on the AVA-ActiveSpeaker dataset.

This report describes our submission to the ActivityNet Challenge at CVPR 2019. We use a 3D convolutional neural network (CNN) based front-end and an ensemble of temporal convolution and LSTM classifiers to predict whether a visible person is speaking or not. Our results show significant improvements over the baseline on the AVA-ActiveSpeaker dataset.

View on arXiv PDF

Similar