Johannes Hellrich

CL
3papers
2,185citations
Novelty33%
AI Score23

3 Papers

CLAug 21, 2018
The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

Johannes Hellrich, Bernd Kampe, Udo Hahn

The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVDPPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embeddings.

CLJul 11, 2018
JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion

Johannes Hellrich, Sven Buechel, Udo Hahn

We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (if available at all). This tool uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities.

CLJun 21, 2018
Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection

Johannes Hellrich, Sven Buechel, Udo Hahn

To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone, existing word emotion induction methods are typically not suited for addressing it. To overcome this limitation, we present adaptations of two popular algorithms to VAD. To measure their effectiveness in diachronic settings, we present the first gold standard for historical word emotions, which was created by scholars with proficiency in the respective language stages and covers both English and German. In contrast to claims in previous work, our findings indicate that hand-selecting small sets of seed words with supposedly stable emotional meaning is actually harmful rather than helpful.