CLOct 3, 2016

Nonsymbolic Text Representation

arXiv:1610.00479v323 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of text processing for applications where segmentation or tokenization is unreliable or unavailable, representing an incremental advance in representation methods.

The authors tackled the problem of text representation without relying on symbolic units like words, introducing the first generic nonsymbolic model that outperforms prior work on information extraction and text denoising tasks, though no specific numbers are provided.

We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes