CVNov 28, 2025

Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day

arXiv:2511.23220v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient tabular data generation for users with limited resources, representing an incremental advance in applying instruction tuning to a specific domain.

The paper tackles the problem of tabular data generation using instruction tuning with limited data and computational resources, achieving performance comparable to GPT-4o by training on only 7K instructions for under 6 hours on an A100 GPU.

Tabular instruction tuning has emerged as a promising research direction for improving LLMs understanding of tabular data. However, the majority of existing works only consider question-answering and reasoning tasks over tabular data, leaving tabular data generation largely unnoticed. In this work, for the first time, we explore the efficacy of instruction tuning in improving LLMs tabular data generation capabilities. More specifically, given the high data and computation requirements of tabular instruction tuning, we aim to address the possibility of instruction tuning for tabular data generation with limited data and computational resources. To achieve this, we first create a high-quality instruction dataset for tabular data, enabling efficient LLM comprehension. We then instruction-tune an open-source LLM (Llama3.1-8B-Instruct) on the training set of this dataset to improve its tabular data generation performance. Our experimental results show that by using our high-quality dataset and instruction-tuning on only 7K instructions with an A100 GPU, for less than 6 hours, we achieve tabular data generation performance on par with the most capable commercial LLM, GPT-4o.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes