CVLGAug 4, 2019

Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning

arXiv:1908.01281v236 citations
AI Analysis

This addresses the efficiency and optimization challenges in embedding learning for applications like face recognition, offering a method to tune objectives separately and accelerate training, though it is incremental as it builds on existing softmax variants.

The paper tackles the entanglement of intra- and inter-class objectives in softmax loss for embedding learning by proposing D-Softmax, which dissects them into independent parts, leading to comparable performance to SphereFace and ArcFace in face verification and achieving up to 64x training acceleration with minimal performance loss in massive-scale data.

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition. However, the intra- and inter-class objectives in the softmax loss are entangled, therefore a well-optimized inter-class objective leads to relaxation on the intra-class objective, and vice versa. In this paper, we propose to dissect the softmax loss into independent intra- and inter-class objective (D-Softmax). With D-Softmax as objective, we can have a clear understanding of both the intra- and inter-class objective, therefore it is straightforward to tune each part to the best state. Furthermore, we find the computation of the inter-class objective is redundant and propose two sampling-based variants of D-Softmax to reduce the computation cost. Training with regular-scale data, experiments in face verification show D-Softmax is favorably comparable to existing losses such as SphereFace and ArcFace. Training with massive-scale data, experiments show the fast variants of D-Softmax significantly accelerates the training process (such as 64x) with only a minor sacrifice in performance, outperforming existing acceleration methods of softmax in terms of both performance and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes