CLOct 15, 2025

FreshTab: Sourcing Fresh Data for Table-to-Text Generation Evaluation

arXiv:2510.13598v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses evaluation reliability issues for researchers in table-to-text generation, though it is incremental as it builds on existing benchmark methods.

The paper tackles the problem of data contamination and domain imbalance in table-to-text generation evaluation by introducing FreshTab, an on-the-fly benchmark generation method from Wikipedia that supports multiple languages. They found that LLM-generated insights from recent tables performed worse by automatic metrics but not in human/LLM evaluations, and domain-balanced benchmarks proved more challenging.

Table-to-text generation (insight generation from tables) is a challenging task that requires precision in analyzing the data. In addition, the evaluation of existing benchmarks is affected by contamination of Large Language Model (LLM) training data as well as domain imbalance. We introduce FreshTab, an on-the-fly table-to-text benchmark generation from Wikipedia, to combat the LLM data contamination problem and enable domain-sensitive evaluation. While non-English table-to-text datasets are limited, FreshTab collects datasets in different languages on demand (we experiment with German, Russian and French in addition to English). We find that insights generated by LLMs from recent tables collected by our method appear clearly worse by automatic metrics, but this does not translate into LLM and human evaluations. Domain effects are visible in all evaluations, showing that a~domain-balanced benchmark is more challenging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes