CLMay 3, 2020

A Two-Stage Masked LM Method for Term Set Expansion

arXiv:2005.01063v1998 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a practical and theoretical problem in natural language processing for tasks requiring generalization from few examples, but it is incremental as it builds on existing MLM and TSE approaches.

The paper tackles the task of Term Set Expansion (TSE) by proposing a novel two-stage masked language model (MLM) method that combines pattern-based and distributional approaches to find more members of a semantic class from a small seed set, outperforming state-of-the-art algorithms.

We tackle the task of Term Set Expansion (TSE): given a small seed set of example terms from a semantic class, finding more members of that class. The task is of great practical utility, and also of theoretical utility as it requires generalization from few examples. Previous approaches to the TSE task can be characterized as either distributional or pattern-based. We harness the power of neural masked language models (MLM) and propose a novel TSE algorithm, which combines the pattern-based and distributional approaches. Due to the small size of the seed set, fine-tuning methods are not effective, calling for more creative use of the MLM. The gist of the idea is to use the MLM to first mine for informative patterns with respect to the seed set, and then to obtain more members of the seed class by generalizing these patterns. Our method outperforms state-of-the-art TSE algorithms. Implementation is available at: https://github.com/ guykush/TermSetExpansion-MPB/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes