Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval
This work addresses the scalability issue in kernel-based distance learning for image retrieval, which is important for researchers and practitioners in computer vision, though it is incremental as it builds on existing metric learning methods.
The authors tackled the problem of supervised discriminative distance learning for semantic category-based image retrieval by proposing a scalable nonlinear embedding algorithm that reduces complexity from scaling with training examples to O(dD), and achieved competitive results on seven challenging datasets using up to 500,000 training pairs of 4096-dimensional CNN features.
We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. We work in the challenging setting where supervision is with constraints on similar and dissimilar pairs while training. The proposed method is derived by an approximate kernelization of a linear Mahalanobis-like distance metric learning algorithm and can also be seen as a kernel neural network. The number of model parameters and test time evaluation complexity of the proposed method are O(dD) where D is the dimensionality of the input features and d is the dimension of the projection space - this is in contrast to the usual kernelization methods as, unlike them, the complexity does not scale linearly with the number of training examples. We propose a stochastic gradient based learning algorithm which makes the method scalable (w.r.t. the number of training examples), while being nonlinear. We train the method with up to half a million training pairs of 4096 dimensional CNN features. We give empirical comparisons with relevant baselines on seven challenging datasets for the task of low dimensional semantic category based image retrieval.