On the Intrinsic Dimensionality of Image Representations
This work addresses the problem of efficient image representation for computer vision researchers and practitioners, offering a method to reduce dimensionality while preserving performance, though it is incremental as it builds on existing deep learning techniques.
The paper tackled the problem of estimating and reducing the intrinsic dimensionality of image representations, showing that deep neural network features have much lower intrinsic dimensions (e.g., 16 for SphereFace vs. 512 ambient) and that their DeepMDS mapping maintains discriminative ability with significant reductions, achieving 59.75% TAR @ 0.1% FAR in 16-dim vs. 71.26% in 512-dim on IJB-C and 77.0% Top-1 accuracy in 19-dim vs. 83.4% in 512-dim on ImageNet-100.
This paper addresses the following questions pertaining to the intrinsic dimensionality of any given image representation: (i) estimate its intrinsic dimensionality, (ii) develop a deep neural network based non-linear mapping, dubbed DeepMDS, that transforms the ambient representation to the minimal intrinsic space, and (iii) validate the veracity of the mapping through image matching in the intrinsic space. Experiments on benchmark image datasets (LFW, IJB-C and ImageNet-100) reveal that the intrinsic dimensionality of deep neural network representations is significantly lower than the dimensionality of the ambient features. For instance, SphereFace's 512-dim face representation and ResNet's 512-dim image representation have an intrinsic dimensionality of 16 and 19 respectively. Further, the DeepMDS mapping is able to obtain a representation of significantly lower dimensionality while maintaining discriminative ability to a large extent, 59.75% TAR @ 0.1% FAR in 16-dim vs 71.26% TAR in 512-dim on IJB-C and a Top-1 accuracy of 77.0% at 19-dim vs 83.4% at 512-dim on ImageNet-100.