CLAILGOct 27, 2020

It's All in the Name: A Character Based Approach To Infer Religion

arXiv:2010.14479v127 citations
Originality Highly original
AI Analysis

This work addresses a demographic inference challenge for social science and NLP applications in South Asia, offering a scalable solution with improved accuracy over dictionary-based methods.

The paper tackles the problem of inferring religion from personal names in South Asia, where disaggregated data is scarce, by using character-based models that classify unseen names with high accuracy and can be scaled to large datasets.

Demographic inference from text has received a surge of attention in the field of natural language processing in the last decade. In this paper, we use personal names to infer religion in South Asia - where religion is a salient social division, and yet, disaggregated data on it remains scarce. Existing work predicts religion using dictionary based method, and therefore, can not classify unseen names. We use character based models which learn character patterns and, therefore, can classify unseen names as well with high accuracy. These models are also much faster and can easily be scaled to large data sets. We improve our classifier by combining the name of an individual with that of their parent/spouse and achieve remarkably high accuracy. Finally, we trace the classification decisions of a convolutional neural network model using layer-wise relevance propagation which can explain the predictions of complex non-linear classifiers and circumvent their purported black box nature. We show how character patterns learned by the classifier are rooted in the linguistic origins of names.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes