Lexical Diversity in Kinship Across Languages and Dialects
This work addresses the problem of limited diversity in computational resources for linguists and NLP researchers, though it is incremental as it builds on prior linguistics research.
The authors tackled the underrepresentation of linguistic diversity in computational lexicons by developing a method to enrich them, verified through case studies on kinship terminology in Arabic dialects and Indonesian languages, resulting in publicly available resources that reveal significant diversity even within close communities.
Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.