UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
This work improves speaker detection in videos for applications like video analysis, though it is incremental as it builds on prior models.
The paper tackled the AVA Active Speaker Detection task by augmenting a previous model with a GRU-based module to handle recurring identities across scenes, achieving a best result of 94.47% mAP on the test set and ranking first in the challenge.
This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes through read and update operations. We report a best result of 94.47% mAP on the AVA-ActiveSpeaker test set, which continues to rank first on this year's challenge leaderboard and significantly pushes the state-of-the-art.