CVFeb 24, 2020

Sketchformer: Transformer-based Representation for Sketched Structure

arXiv:2002.10381v1151 citations
AI Analysis

This work addresses the need for better sketch representation in computer vision, offering incremental improvements over existing LSTM-based methods like SketchRNN.

The authors tackled the problem of encoding free-hand sketches as vector sequences by introducing Sketchformer, a transformer-based representation that achieved state-of-the-art performance in sketch classification and image retrieval tasks, with significant improvements in reconstruction and interpolation for complex sketches.

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes