Niels Vanhasbroeck

h-index1
1paper

1 Paper

CLApr 20, 2025
Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives

Ratna Kandala, Niels Vanhasbroeck, Katie Hoemann

Standard topic models often struggle to capture culturally specific nuances in text. This study evaluates the effectiveness of contextual embeddings for identifying culturally resonant themes in an underrepresented linguistic context. We compare the performance of KMeans Clustering, Latent Dirichlet Allocation (LDA), and BERTopic on a corpus of nearly 25,000 daily personal narratives written in Belgian-Dutch (Flemish). While LDA achieves strong performance on automated coherence metrics, subsequent human evaluation reveals that BERTopic consistently identifies the most coherent and culturally relevant topics, highlighting the limitations of purely statistical methods on this narrative-rich data. Furthermore, the diminished performance of K-Means compared to prior work on similar Dutch corpora underscores the unique linguistic challenges posed by personal narrative analysis. Our findings demonstrate the critical role of contextual embeddings in robust topic modeling and emphasize the need for human-centered evaluation, particularly when working with low-resource languages and culturally specific domains.