Encoding CNN Activations for Writer Recognition
This work addresses writer identification and retrieval, a domain-specific task, with incremental improvements in encoding techniques.
The paper tackled the problem of encoding CNN activations for writer recognition by comparing VLAD encoding with triangulation embedding and investigating generalized max pooling, decorrelation, and Exemplar SVMs, resulting in new state-of-the-art standards on ICDAR13 and KHATT datasets.
The encoding of local features is an essential part for writer identification and writer retrieval. While CNN activations have already been used as local features in related works, the encoding of these features has attracted little attention so far. In this work, we compare the established VLAD encoding with triangulation embedding. We further investigate generalized max pooling as an alternative to sum pooling and the impact of decorrelation and Exemplar SVMs. With these techniques, we set new standards on two publicly available datasets (ICDAR13, KHATT).