CL LGSep 6, 2023

Persona-aware Generative Model for Code-mixed Language

Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty

arXiv:2309.02915v21.34 citationsh-index: 41Has Code

Originality Incremental advance

AI Analysis

This addresses the need for personalized text generation in multilingual online contexts, though it is incremental as it builds on existing Transformer-based methods.

The paper tackles the problem of generating realistic code-mixed texts by developing a persona-aware generative model, PARADOX, which achieves improvements such as 1.6 points better CM BLEU, 47% better perplexity, and 32% better semantic coherence compared to non-persona-based models.

Code-mixing and script-mixing are prevalent across online social networks and multilingual societies. However, a user's preference toward code-mixing depends on the socioeconomic status, demographics of the user, and the local context, which existing generative models mostly ignore while generating code-mixed texts. In this work, we make a pioneering attempt to develop a persona-aware generative model to generate texts resembling real-life code-mixed texts of individuals. We propose a Persona-aware Generative Model for Code-mixed Generation, PARADOX, a novel Transformer-based encoder-decoder model that encodes an utterance conditioned on a user's persona and generates code-mixed texts without monolingual reference data. We propose an alignment module that re-calibrates the generated sequence to resemble real-life code-mixed texts. PARADOX generates code-mixed texts that are semantically more meaningful and linguistically more valid. To evaluate the personification capabilities of PARADOX, we propose four new metrics -- CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6 points better CM BLEU, 47% better perplexity and 32% better semantic coherence than the non-persona-based counterparts.

View on arXiv PDF Code

Similar