CLJun 17, 2024

Building Knowledge-Guided Lexica to Model Cultural Variation

Shreya Havaldar, Salvatore Giorgi, Sunny Rai, Young-Min Cho, Thomas Talhelm, Sharath Chandra Guntuku, Lyle Ungar

arXiv:2406.11622v217.336 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of computationally modeling cultural variation for NLP researchers, though it appears incremental as it builds on existing lexicon methods.

The paper tackles the problem of measuring cultural variation across regions using language, introducing a scalable solution by building knowledge-guided lexica to model this variation, while highlighting that modern LLMs fail in this task.

Cultural variation exists between nations (e.g., the United States vs. China), but also within regions (e.g., California vs. Texas, Los Angeles vs. San Francisco). Measuring this regional cultural variation can illuminate how and why people think and behave differently. Historically, it has been difficult to computationally model cultural variation due to a lack of training data and scalability constraints. In this work, we introduce a new research problem for the NLP community: How do we measure variation in cultural constructs across regions using language? We then provide a scalable solution: building knowledge-guided lexica to model cultural variation, encouraging future work at the intersection of NLP and cultural understanding. We also highlight modern LLMs' failure to measure cultural variation or generate culturally varied language.

View on arXiv PDF Code

Similar