CLJul 19, 2024

An Improved Method for Class-specific Keyword Extraction: A Case Study in the German Business Registry

arXiv:2407.14085v122 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of identifying keywords specific to predefined classes, such as economic sectors in business registry data, for researchers and practitioners in information extraction, but it is incremental as it builds on existing methods.

The paper tackled the problem of extracting class-specific keywords, which is challenging for tasks like document classification, by proposing an improved method based on KeyBERT that uses seed keywords. The results showed it greatly improved upon previous approaches, setting a new standard for class-specific keyword extraction.

The task of $\textit{keyword extraction}$ is often an important initial step in unsupervised information extraction, forming the basis for tasks such as topic modeling or document classification. While recent methods have proven to be quite effective in the extraction of keywords, the identification of $\textit{class-specific}$ keywords, or only those pertaining to a predefined class, remains challenging. In this work, we propose an improved method for class-specific keyword extraction, which builds upon the popular $\textbf{KeyBERT}$ library to identify only keywords related to a class described by $\textit{seed keywords}$. We test this method using a dataset of German business registry entries, where the goal is to classify each business according to an economic sector. Our results reveal that our method greatly improves upon previous approaches, setting a new standard for $\textit{class-specific}$ keyword extraction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes