Annette Kinder

h-index16

5papers

19citations

Novelty24%

AI Score16

Ranked #191,567 of 194,257 authors (top 99%)#30,550 in CL (top 99%)

5 Papers

0.8CLJan 12, 2022

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

Arthur M. Jacobs, Annette Kinder

The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC (i.e., children and youth, essays, novels, plays, poems, stories) and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC (e.g., Jane Austen's six novels), and iii) two experiments on text classification and authorship recognition using novel features of semantic complexity. The data on two novel measures estimating a text's literariness, intratextual variance and stepwise distance (van Cranenburgh et al., 2019) revealed that plays are the most literary texts in GLEC, followed by poems and novels. Computation of a novel index of text creativity (Gray et al., 2016) revealed poems and plays as the most creative categories with the most creative authors all being poets (Milton, Pope, Keats, Byron, or Wordsworth). We also computed a novel index of perceived beauty of verbal art (Kintsch, 2012) for the works in GLEC and predict that Emma is the theoretically most beautiful of Austen's novels. Finally, we demonstrate that these novel measures of semantic complexity are important features for text classification and authorship recognition with overall predictive accuracies in the range of .75 to .97. Our data pave the way for future computational and empirical studies of literature or experiments in reading psychology and offer multiple baselines and benchmarks for analysing and validating other book corpora.

0.2CLSep 26, 2021

Electoral Programs of German Parties 2021: A Computational Analysis Of Their Comprehensibility and Likeability Based On SentiArt

Arthur M. Jacobs, Annette Kinder

The electoral programs of six German parties issued before the parliamentary elections of 2021 are analyzed using state-of-the-art computational tools for quantitative narrative, topic and sentiment analysis. We compare different methods for computing the textual similarity of the programs, Jaccard Bag similarity, Latent Semantic Analysis, doc2vec, and sBERT, the representational and computational complexity increasing from the 1st to the 4th method. A new similarity measure for entire documents derived from the Fowlkes Mallows Score is applied to kmeans clustering of sBERT transformed sentences. Using novel indices of the readability and emotion potential of texts computed via SentiArt (Jacobs, 2019), our data shed light on the similarities and differences of the programs regarding their length, main ideas, comprehensibility, likeability, and semantic complexity. Among others, they reveal that the programs of the SPD and CDU have the best chances to be comprehensible and likeable -all other things being equal-, and they raise the important issue of which similarity measure is optimal for comparing texts such as electoral programs which necessarily share a lot of words. While such analyses can not replace qualitative analyses or a deep reading of the texts, they offer predictions that can be verified in empirical studies and may serve as a motivation for changing aspects of future electoral programs potentially making them more comprehensible and/or likeable.

0.2CLJun 14, 2021

Is Einstein more agreeable and less neurotic than Hitler? A computational exploration of the emotional and personality profiles of historical persons

Arthur M. Jacobs, Annette Kinder

Recent progress in distributed semantic models (DSM) offers new ways to estimate personality traits of both fictive and real people. In this exploratory study we applied an extended version of the algorithm developed in Jacobs (2019) to compute the likeability scores, emotional figure profiles and BIG5 personality traits for 100 historical persons from the arts, politics or science domains whose names are rather unique (e.g., Einstein, Kahlo, Picasso). We compared the results produced by static (word2vec) and dynamic (BERT) language model representations in four studies. The results show both the potential and limitations of such DSM-based computations of personality profiles and point ways to further develop this approach to become a useful tool in data science, psychology or computational and neurocognitive poetics (Jacobs, 2015).

0.2CLOct 21, 2020

Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set

Arthur M. Jacobs, Annette Kinder

The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. However, so far only a small subcorpus, the Gutenberg English Poetry Corpus, has been submitted to quantitative text analyses providing predictions for scientific studies of literature. Here we show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features, computed via style and sentiment analysis, in both tasks. Our results identify two standard and two novel features (i.e., type-token ratio, frequency, sonority score, surprise) as most diagnostic in these tasks. By providing a simple tool applicable to both short poems and long novels generating quantitative predictions about features that co-determe the cognitive and affective processing of specific text categories or authors, our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.

0.2CLAug 24, 2018

Features of word similarity

Arthur M. Jacobs, Annette Kinder

In this theoretical note we compare different types of computational models of word similarity and association in their ability to predict a set of about 900 rating data. Using regression and predictive modeling tools (neural net, decision tree) the performance of a total of 28 models using different combinations of both surface and semantic word features is evaluated. The results present evidence for the hypothesis that word similarity ratings are based on more than only semantic relatedness. The limited cross-validated performance of the models asks for the development of psychological process models of the word similarity rating task.