CLJun 20, 2018

TxPI-u: A Resource for Personality Identification of Undergraduates

Gabriela Ramírez-de-la-Rosa, Esaú Villatoro-Tello, Héctor Jiménez-Salazar

arXiv:1806.07977v19 citations

Originality Synthesis-oriented

AI Analysis

This addresses the lack of non-English resources for personality identification in NLP, providing a domain-specific dataset for researchers working with Spanish text.

The paper introduces TxPI-u, a new Spanish corpus for personality identification based on the Big Five Model, containing data from 416 Mexican undergraduate students with demographics, to train automatic models for assigning personality traits from text.

Resources such as labeled corpora are necessary to train automatic models within the natural language processing (NLP) field. Historically, a large number of resources regarding a broad number of problems are available mostly in English. One of such problems is known as Personality Identification where based on a psychological model (e.g. The Big Five Model), the goal is to find the traits of a subject's personality given, for instance, a text written by the same subject. In this paper we introduce a new corpus in Spanish called Texts for Personality Identification (TxPI). This corpus will help to develop models to automatically assign a personality trait to an author of a text document. Our corpus, TxPI-u, contains information of 416 Mexican undergraduate students with some demographics information such as, age, gender, and the academic program they are enrolled. Finally, as an additional contribution, we present a set of baselines to provide a comparison scheme for further research.

View on arXiv PDF

Similar