Diversidade linguística e inclusão digital: desafios para uma ia brasileira
This addresses the problem of digital exclusion and language loss for speakers of underrepresented languages, highlighting an incremental but critical issue in AI development.
The paper examines how generative AI's reliance on documented languages creates a selection bias that threatens linguistic diversity, leading to a vicious cycle where dominant languages become standardized while others are marginalized.
Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.