CL IRMar 19, 2022

Domain Representative Keywords Selection: A Probabilistic Approach

Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang, Yunyao Li, Lucian Popa, ChengXiang Zhai

arXiv:2203.10365v231.9638 citationsh-index: 82Has Code

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem in natural language processing, providing an incremental improvement for keyword selection tasks.

The paper tackles the problem of selecting representative keywords for a target domain by contrasting it with a context domain, using a probabilistic approach and optimization algorithm, and demonstrates superiority over baselines in experiments on keyword summary generation and trending keywords selection.

We propose a probabilistic approach to select a subset of a \textit{target domain representative keywords} from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adapt the \textit{two-component mixture model} concept to generate a distribution of candidate keywords. It provides more importance to the \textit{distinctive} keywords of the target domain than common keywords contrasting with the context domain. To support the \textit{representativeness} of the selected keywords towards the target domain, we introduce an \textit{optimization algorithm} for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.

View on arXiv PDF Code

Similar