CLJul 2, 2016

Text comparison using word vector representations and dimensionality reduction

arXiv:1607.00534v14.222 citations

Originality Synthesis-oriented

AI Analysis

This provides a visualization tool for text analysis, but it is incremental as it applies existing methods (word2vec and t-SNE) to text comparison without introducing new algorithmic components.

The paper tackles the problem of comparing large text sources by using word2vec and t-SNE to create a 2D map where semantically similar words are close together, enabling users to explore texts like a geographical map.

This paper describes a technique to compare large text sources using word vector representations (word2vec) and dimensionality reduction (t-SNE) and how it can be implemented using Python. The technique provides a bird's-eye view of text sources, e.g. text summaries and their source material, and enables users to explore text sources like a geographical map. Word vector representations capture many linguistic properties such as gender, tense, plurality and even semantic concepts like "capital city of". Using dimensionality reduction, a 2D map can be computed where semantically similar words are close to each other. The technique uses the word2vec model from the gensim Python library and t-SNE from scikit-learn.

View on arXiv PDF

Similar