Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models
This work addresses a methodological gap for researchers in NLP by providing an apples-to-apples comparison, though it is incremental in nature.
The study tackled the challenge of isolating the effects of training data versus base models in table instruction tuning for LLMs, achieving new state-of-the-art performance on the Hitab dataset with results on par or surpassing existing models.
Recent advances in natural language processing have leveraged instruction tuning to enhance Large Language Models (LLMs) for table-related tasks. However, previous works train different base models with different training data, lacking an apples-to-apples comparison across the result table LLMs. To address this, we fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs, establishing new state-of-the-art performance on Hitab, a table question-answering dataset. More importantly, through systematic out-of-domain evaluation, we decouple the contributions of training data and the base model, providing insight into their individual impacts. In addition, we assess the effects of table-specific instruction tuning on general-purpose benchmarks, revealing trade-offs between specialization and generalization.