Lexico-semantic and affective modelling of Spanish poetry: A semi-supervised learning approach
This work addresses the understudied domain of Spanish poetry analysis for researchers in computational linguistics and digital humanities, but it is incremental as it applies existing methods to a new dataset.
The paper tackles the problem of classifying psychological and affective categories in Spanish poetry, which has received less attention than prose, using a semi-supervised learning approach on a corpus of 4572 sonnets. The result is an AUC beyond 0.7 for 76% of psychological categories and up to a 0.12 increase compared to using transformers alone.
Text classification tasks have improved substantially during the last years by the usage of transformers. However, the majority of researches focus on prose texts, with poetry receiving less attention, specially for Spanish language. In this paper, we propose a semi-supervised learning approach for inferring 21 psychological categories evoked by a corpus of 4572 sonnets, along with 10 affective and lexico-semantic multiclass ones. The subset of poems used for training an evaluation includes 270 sonnets. With our approach, we achieve an AUC beyond 0.7 for 76% of the psychological categories, and an AUC over 0.65 for 60% on the multiclass ones. The sonnets are modelled using transformers, through sentence embeddings, along with lexico-semantic and affective features, obtained by using external lexicons. Consequently, we see that this approach provides an AUC increase of up to 0.12, as opposed to using transformers alone.