CLJun 21, 2024

Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text

Pritika Ramu, Aparna Garimella, Sambaran Bandyopadhyay

arXiv:2406.14829v315.226 citations

Originality Incremental advance

AI Analysis

This addresses the need for better evaluation metrics in text-to-table generation, which is crucial for automated document creation and editing, though it is incremental as it builds on existing evaluation challenges.

The paper tackles the problem of evaluating table generation quality by highlighting that existing metrics fail to capture table semantics and can misjudge quality. It proposes TabEval, a novel evaluation strategy that breaks tables into natural language statements and uses entailment-based measures, showing stronger correlation with human judgments across four datasets.

Understanding whether a generated table is of good quality is important to be able to use it in creating or editing documents using automatic methods. In this work, we underline that existing measures for table quality evaluation fail to capture the overall semantics of the tables, and sometimes unfairly penalize good tables and reward bad ones. We propose TabEval, a novel table evaluation strategy that captures table semantics by first breaking down a table into a list of natural language atomic statements and then compares them with ground truth statements using entailment-based measures. To validate our approach, we curate a dataset comprising of text descriptions for 1,250 diverse Wikipedia tables, covering a range of topics and structures, in contrast to the limited scope of existing datasets. We compare TabEval with existing metrics using unsupervised and supervised text-to-table generation methods, demonstrating its stronger correlation with human judgments of table quality across four datasets.

View on arXiv PDF

Similar