CL AI CV LGAug 19, 2019

Representing text as abstract images enables image classifiers to also simultaneously classify text

arXiv:1908.07846v30.31 citations

Originality Incremental advance

AI Analysis

This method addresses entity disambiguation in patents, which is incremental as it adapts existing image classification techniques to text data.

The paper tackles the problem of entity disambiguation for inventor names in US patents by converting text into abstract images, enabling image classifiers to process text and achieving highly accurate results.

We introduce a novel method for converting text data into abstract image representations, which allows image-based processing techniques (e.g. image classification networks) to be applied to text-based comparison problems. We apply the technique to entity disambiguation of inventor names in US patents. The method involves converting text from each pairwise comparison between two inventor name records into a 2D RGB (stacked) image representation. We then train an image classification neural network to discriminate between such pairwise comparison images, and use the trained network to label each pair of records as either matched (same inventor) or non-matched (different inventors), obtaining highly accurate results. Our new text-to-image representation method could also be used more broadly for other NLP comparison problems, such as disambiguation of academic publications, or for problems that require simultaneous classification of both text and image datasets.

View on arXiv PDF

Similar