CL SIMar 13, 2020

Using word embeddings to improve the discriminability of co-occurrence text networks

Laura V. C. Quispe, Jorge A. V. Tohalino, Diego R. Amancio

arXiv:2003.06279v10.2

Originality Incremental advance

AI Analysis

This work addresses a limitation in text analysis for natural language processing and theoretical language studies by enhancing co-occurrence network representations, though it is incremental as it builds on existing embedding methods.

The paper tackled the problem of traditional co-occurrence networks failing to link similar words that are distant in text by using word embeddings to create virtual links, resulting in improved discriminability in stylometry tasks with Glove, Word2Vec, and FastText, and optimized performance when stopwords are retained and a global thresholding strategy is applied.

Word co-occurrence networks have been employed to analyze texts both in the practical and theoretical scenarios. Despite the relative success in several applications, traditional co-occurrence networks fail in establishing links between similar words whenever they appear distant in the text. Here we investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems. Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText. In addition, we found that optimized results are obtained when stopwords are not disregarded and a simple global thresholding strategy is used to establish virtual links. Because the proposed approach is able to improve the representation of texts as complex networks, we believe that it could be extended to study other natural language processing tasks. Likewise, theoretical languages studies could benefit from the adopted enriched representation of word co-occurrence networks.

View on arXiv PDF

Similar