CLAIAug 12, 2024

Evaluating LLMs on Entity Disambiguation in Tables

arXiv:2408.06423v33 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

It addresses the lack of consistent evaluation for LLMs in table annotation, which is crucial for researchers and practitioners in natural language processing and data analysis, though it is incremental as it focuses on benchmarking existing methods.

This work tackles the problem of evaluating large language models (LLMs) on entity disambiguation in tables by conducting an extensive evaluation of four state-of-the-art approaches and two GPT models, measuring their performance, computational requirements, and costs to facilitate comparison and guide future research.

Tables are crucial containers of information, but understanding their meaning may be challenging. Over the years, there has been a surge in interest in data-driven approaches based on deep learning that have increasingly been combined with heuristic-based ones. In the last period, the advent of \acf{llms} has led to a new category of approaches for table annotation. However, these approaches have not been consistently evaluated on a common ground, making evaluation and comparison difficult. This work proposes an extensive evaluation of four STI SOTA approaches: Alligator (formerly s-elbat), Dagobah, TURL, and TableLlama; the first two belong to the family of heuristic-based algorithms, while the others are respectively encoder-only and decoder-only Large Language Models (LLMs). We also include in the evaluation both GPT-4o and GPT-4o-mini, since they excel in various public benchmarks. The primary objective is to measure the ability of these approaches to solve the entity disambiguation task with respect to both the performance achieved on a common-ground evaluation setting and the computational and cost requirements involved, with the ultimate aim of charting new research paths in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes