CLJul 27, 2020

Linguistic Taboos and Euphemisms in Nepali

arXiv:2007.13798v113 citations
AI Analysis

This provides a foundational resource for downstream tasks like offensive language detection and language learning in Nepali, but it is incremental as it applies existing linguistic analysis methods to a new language.

The paper tackles the problem of identifying offensive language in Nepali by conducting a corpus-based study, resulting in a dataset of over 1000 terms and categorization into 18 categories and 12 euphemisms.

Languages across the world have words, phrases, and behaviors -- the taboos -- that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we introduce a manually constructed data set of over 1000 offensive and taboo terms popular among contemporary speakers. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks such as offensive language detection and language learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes