CVAug 31, 2018

Seeing Colors: Learning Semantic Text Encoding for Classification

Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo

arXiv:1808.10822v10.9

Originality Synthesis-oriented

AI Analysis

This work addresses text classification by allowing cross-domain application of CNNs, though it is incremental as it adapts existing methods to a new modality.

The authors tackled the problem of text classification by converting documents into encoded images using word embeddings and CNNs, achieving promising results on benchmark datasets. This approach enables the use of advanced CNN architectures from computer vision for NLP tasks.

The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents? To answer this question we present a novel text classification method which converts a text document into an encoded image, using word embedding and capabilities of Convolutional Neural Networks (CNNs), successfully employed in image classification. We evaluate our approach by obtaining promising results on some well-known benchmark datasets for text classification. This work allows the application of many of the advanced CNN architectures developed for Computer Vision to Natural Language Processing. We test the proposed approach on a multi-modal dataset, proving that it is possible to use a single deep model to represent text and image in the same feature space.

View on arXiv PDF

Similar