CVApr 26, 2019

Recurrent Embedding Aggregation Network for Video Face Recognition

arXiv:1904.12019v218 citations
Originality Incremental advance
AI Analysis

This work addresses video face recognition for security and surveillance applications, offering an incremental improvement over existing methods.

The paper tackled video face recognition by proposing a Recurrent Embedding Aggregation Network (REAN) to aggregate pre-trained embeddings, avoiding overfitting and leveraging context to handle noise from redundant frames, resulting in significant performance improvements on datasets like IJB-S, YTF, and PaSC.

Recurrent networks have been successful in analyzing temporal data and have been widely used for video analysis. However, for video face recognition, where the base CNNs trained on large-scale data already provide discriminative features, using Long Short-Term Memory (LSTM), a popular recurrent network, for feature learning could lead to overfitting and degrade the performance instead. We propose a Recurrent Embedding Aggregation Network (REAN) for set to set face recognition. Compared with LSTM, REAN is robust against overfitting because it only learns how to aggregate the pre-trained embeddings rather than learning representations from scratch. Compared with quality-aware aggregation methods, REAN can take advantage of the context information to circumvent the noise introduced by redundant video frames. Empirical results on three public domain video face recognition datasets, IJB-S, YTF, and PaSC show that the proposed REAN significantly outperforms naive CNN-LSTM structure and quality-aware aggregation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes