Affect, Body, Cognition, Demographics, and Emotion: The ABCDE of Text Features for Computational Affective Science
This dataset facilitates interdisciplinary research across fields such as affective science, sociology, and computational linguistics, though it is incremental as it builds on existing resources.
The authors tackled the challenge of accessing and using labeled language data for computational affective and social science by creating the ABCDE dataset, which includes over 400 million text utterances annotated with features like affect, body, cognition, demographics, and emotion.
Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on language data that is first labeled with relevant information, such as the use of emotion words or the age of the speaker. Although many resources and algorithms exist to enable this type of labeling, discovering, accessing, and using them remains a substantial impediment, particularly for practitioners outside of computer science. Here, we present the ABCDE dataset (Affect, Body, Cognition, Demographics, and Emotion), a large-scale collection of over 400 million text utterances drawn from social media, blogs, books, and AI-generated sources. The dataset is annotated with a wide range of features relevant to computational affective and social science. ABCDE facilitates interdisciplinary research across numerous fields, including affective science, cognitive science, the digital humanities, sociology, political science, and computational linguistics.