CLAug 1, 2025

ITUNLP at SemEval-2025 Task 8: Question-Answering over Tabular Data: A Zero-Shot Approach using LLM-Driven Code Generation

Atakan Site, Emre Hakan Erdemir, Gülşen Eryiğit

arXiv:2508.00762v16.72 citationsh-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of efficient tabular data querying for users in data analysis, though it is incremental as it applies existing LLM methods to a specific task.

The paper tackled question-answering over tabular data by developing a zero-shot approach using LLM-driven Python code generation, achieving eighth place in Subtask I and sixth place in Subtask II among 30 systems that outperformed the baseline in the open-source models category.

This paper presents our system for SemEval-2025 Task 8: DataBench, Question-Answering over Tabular Data. The primary objective of this task is to perform question answering on given tabular datasets from diverse domains under two subtasks: DataBench QA (Subtask I) and DataBench Lite QA (Subtask II). To tackle both subtasks, we developed a zero-shot solution with a particular emphasis on leveraging Large Language Model (LLM)-based code generation. Specifically, we propose a Python code generation framework utilizing state-of-the-art open-source LLMs to generate executable Pandas code via optimized prompting strategies. Our experiments reveal that different LLMs exhibit varying levels of effectiveness in Python code generation. Additionally, results show that Python code generation achieves superior performance in tabular question answering compared to alternative approaches. Although our ranking among zero-shot systems is unknown at the time of this paper's submission, our system achieved eighth place in Subtask I and sixth place in Subtask~II among the 30 systems that outperformed the baseline in the open-source models category.

View on arXiv PDF

Similar