CLAug 16, 2021

An NLP approach to quantify dynamic salience of predefined topics in a text corpus

arXiv:2108.07345v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem for analysts needing to profile social and cultural trends from online news media, though it is incremental as it builds on existing NLP techniques.

The authors tackled the challenge of analyzing large text corpora to understand social trends by developing an NLP method to quantify how predefined topics change in salience over time, identifying n-grams with usage patterns that deviate from a baseline to track emergence or disappearance.

The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location, making sense of an entire corpus to extract significant trends is a steep challenge for any one analyst or team. Here, we present an approach using natural language processing techniques that seeks to quantify how a set of pre-defined topics of interest change over time across a large corpus of text. We found that, given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline. Emergence, disappearance, or significant variations in n-gram usage present a ground-up picture of a topic's dynamic salience within a corpus of interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes