CLApr 19, 2022

Detecting Text Formality: A Study of Text Classification Approaches

Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

arXiv:2204.08975v217.3137 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This is an incremental study that addresses the need for formality detection in natural language processing applications.

This paper tackled the problem of automatically detecting text formality by systematically comparing statistical, neural-based, and Transformer-based methods, finding that Char BiLSTM outperforms Transformers in monolingual and multilingual tasks, while Transformers are more stable for cross-lingual transfer.

Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation -- GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task, while Transformer-based classifiers are more stable to cross-lingual knowledge transfer.

View on arXiv PDF Code

Similar