CL CYFeb 28, 2025

Identifying Emerging Concepts in Large Corpora

arXiv:2502.21315v111 citationsh-index: 1NAACL

Originality Incremental advance

AI Analysis

This addresses the need for timely concept detection in large-scale text analysis, with applications in political science and social research, though it appears incremental as it builds on existing embedding methods.

The paper tackles the problem of detecting emerging concepts in large text corpora by analyzing embedding space heatmaps, achieving high accuracy shortly after concept origination and outperforming common alternatives. It demonstrates utility by analyzing U.S. Senate speeches from 1941 to 2015, finding that the minority party is more active in introducing new concepts and identifying concepts correlating with senators' identities.

We introduce a new method to identify emerging concepts in large text corpora. By analyzing changes in the heatmaps of the underlying embedding space, we are able to detect these concepts with high accuracy shortly after they originate, in turn outperforming common alternatives. We further demonstrate the utility of our approach by analyzing speeches in the U.S. Senate from 1941 to 2015. Our results suggest that the minority party is more active in introducing new concepts into the Senate discourse. We also identify specific concepts that closely correlate with the Senators' racial, ethnic, and gender identities. An implementation of our method is publicly available.

View on arXiv PDF

Similar