CL LGDec 6, 2024

Diversity Over Quantity: A Lesson From Few Shot Relation Classification

Amir DN Cohen, Shauli Ravfogel, Shaltiel Shmidman, Yoav Goldberg

arXiv:2412.05434v11.91 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of data efficiency in NLP for few-shot relation classification, suggesting a shift from large-scale datasets to targeted diversity curation.

The paper tackles the problem of few-shot relation classification by showing that training on diverse relation types, rather than scaling data size, significantly improves generalization to unseen relations, with consistent gains demonstrated across various few-shot scenarios.

In few-shot relation classification (FSRC), models must generalize to novel relations with only a few labeled examples. While much of the recent progress in NLP has focused on scaling data size, we argue that diversity in relation types is more crucial for FSRC performance. In this work, we demonstrate that training on a diverse set of relations significantly enhances a model's ability to generalize to unseen relations, even when the overall dataset size remains fixed. We introduce REBEL-FS, a new FSRC benchmark that incorporates an order of magnitude more relation types than existing datasets. Through systematic experiments, we show that increasing the diversity of relation types in the training data leads to consistent gains in performance across various few-shot learning scenarios, including high-negative settings. Our findings challenge the common assumption that more data alone leads to better performance and suggest that targeted data curation focused on diversity can substantially reduce the need for large-scale datasets in FSRC.

View on arXiv PDF

Similar