CLOct 22, 2019

Automatic Extraction of Personality from Text: Challenges and Opportunities

arXiv:1910.09916v115 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of personality extraction from text for applications in psychology and social computing, but it is incremental as it highlights limitations rather than achieving breakthroughs.

The study tackled the problem of automatically extracting personality traits from text by creating and comparing datasets with different reliability levels, finding that models trained on a small high-reliability dataset performed better in controlled tests but failed to outperform a random baseline when evaluated on real-world data.

In this study, we examined the possibility to extract personality traits from a text. We created an extensive dataset by having experts annotate personality traits in a large number of texts from multiple online sources. From these annotated texts, we selected a sample and made further annotations ending up in a large low-reliability dataset and a small high-reliability dataset. We then used the two datasets to train and test several machine learning models to extract personality from text, including a language model. Finally, we evaluated our best models in the wild, on datasets from different domains. Our results show that the models based on the small high-reliability dataset performed better (in terms of $\textrm{R}^2$) than models based on large low-reliability dataset. Also, language model based on small high-reliability dataset performed better than the random baseline. Finally, and more importantly, the results showed our best model did not perform better than the random baseline when tested in the wild. Taken together, our results show that determining personality traits from a text remains a challenge and that no firm conclusions can be made on model performance before testing in the wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes