CLAIHCMar 19, 2021

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

arXiv:2103.11029v1728 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This tool aids researchers in analyzing semantic shifts between corpora, but it is incremental as it builds on existing embedding methods with a new interface and metric.

The authors tackled the problem of limited use of embeddings for comparative corpus analysis by introducing TextEssence, an interactive web-based tool with visual and similarity-based modes, and a new embedding confidence measure, demonstrated in a COVID-19 literature case study.

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes