CLOct 22, 2024

All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing

arXiv:2410.17355v32 citationsh-index: 2SEM
Originality Incremental advance
AI Analysis

This work addresses the problem of handling infrequent entities in entity typing for NLP researchers, but it is incremental as it builds on existing knowledge-infused methods.

The paper investigates the limitations of pre-trained language models (PLMs) in ultra-fine entity typing, showing that they struggle with entities in the long tail of the pre-training distribution, and that knowledge-infused approaches can partially address these issues.

Due to their capacity to acquire world knowledge from large corpora, pre-trained language models (PLMs) are extensively used in ultra-fine entity typing tasks where the space of labels is extremely large. In this work, we explore the limitations of the knowledge acquired by PLMs by proposing a novel heuristic to approximate the pre-training distribution of entities when the pre-training data is unknown. Then, we systematically demonstrate that entity-typing approaches that rely solely on the parametric knowledge of PLMs struggle significantly with entities at the long tail of the pre-training distribution, and that knowledge-infused approaches can account for some of these shortcomings. Our findings suggest that we need to go beyond PLMs to produce solutions that perform well for infrequent entities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes