Deep Tensor Encoding
This work addresses an incremental improvement in retrieval fidelity for computer vision applications by incorporating structural constraints into deep-learning classifiers.
The paper tackles the problem of feature encoding for content-based information retrieval by proposing a structured tensor factorization scheme that preserves the multi-linear structure of feature tensors, achieving retrieval performance comparable to Fisher vector encodings in terms of average precision.
Learning an encoding of feature vectors in terms of an over-complete dictionary or a information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using deep-learning classifiers, such encodings are learnt on the flattened last layer, without adherence to the multi-linear structure of the underlying feature tensor. We illustrate a variety of feature encodings incl. sparse dictionary coding and Fisher vectors along with proposing that a structured tensor factorization scheme enables us to perform retrieval that can be at par, in terms of average precision, with Fisher vector encoded image signatures. In short, we illustrate how structural constraints increase retrieval fidelity.