Large-Scale Classification of Structured Objects using a CRF with Deep Class Embedding
This addresses the challenge of thorough contextual learning in CRFs for large-scale, sparse data, which is incremental as it builds on existing CRF approaches with deep learning enhancements.
The paper tackles the problem of classifying structured objects with many visually similar categories by proposing a deep learning architecture that models image sequences as linear-chain CRFs, jointly learning from visual features and class embeddings. It demonstrates significantly improved results on a large retail-store product dataset compared to existing methods like linear CRF modeling.
This paper presents a novel deep learning architecture to classify structured objects in datasets with a large number of visually similar categories. We model sequences of images as linear-chain CRFs, and jointly learn the parameters from both local-visual features and neighboring classes. The visual features are computed by convolutional layers, and the class embeddings are learned by factorizing the CRF pairwise potential matrix. This forms a highly nonlinear objective function which is trained by optimizing a local likelihood approximation with batch-normalization. This model overcomes the difficulties of existing CRF methods to learn the contextual relationships thoroughly when there is a large number of classes and the data is sparse. The performance of the proposed method is illustrated on a huge dataset that contains images of retail-store product displays, taken in varying settings and viewpoints, and shows significantly improved results compared to linear CRF modeling and unnormalized likelihood optimization.