Generalisation and Sharing in Triplet Convnets for Sketch based Visual Search
This work addresses sketch-based visual search for retrieval tasks, presenting incremental improvements through architectural and training strategy optimizations.
The paper tackles the problem of sketch-based image retrieval by proposing triplet CNN architectures that generalize across diverse object categories from limited training data, achieving an 18% performance improvement on the Flickr15k benchmark and approximately 10 Tb on the TU-Berlin benchmark.
We propose and evaluate several triplet CNN architectures for measuring the similarity between sketches and photographs, within the context of the sketch based image retrieval (SBIR) task. In contrast to recent fine-grained SBIR work, we study the ability of our networks to generalise across diverse object categories from limited training data, and explore in detail strategies for weight sharing, pre-processing, data augmentation and dimensionality reduction. We exceed the performance of pre-existing techniques on both the Flickr15k category level SBIR benchmark by $18\%$, and the TU-Berlin SBIR benchmark by $\sim10 \mathcal{T}_b$, when trained on the 250 category TU-Berlin classification dataset augmented with 25k corresponding photographs harvested from the Internet.