p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models
This work addresses the challenge of improving TQA models for tasks like cell retrieval and data analysis, offering an incremental advancement through automated data construction and efficiency gains.
The paper tackles the problem of under-utilizing data and neglecting post-training in table question answering (TQA) by introducing p2-TQA, a process-based preference learning framework that automatically constructs preference data and uses contrastive learning, resulting in improvements of up to 5% on in-domain datasets and 2.4% on out-of-domain datasets with only 8,000 training instances.
Table question answering (TQA) focuses on answering questions based on tabular data. Developing TQA systems targets effective interaction with tabular data for tasks such as cell retrieval and data analysis. While recent work has leveraged fine-tuning to improve TQA systems, existing approaches often under-utilize available data and neglect the potential of post-training for further gains. In this work, we introduce p2-TQA, a process-based preference learning framework for TQA post-training. p2-TQA automatically constructs process-based preference data via a table-specific pipeline, eliminating the need for manual or costly data collection. It then optimizes models through contrastive learning on the collected data. Experiments show that p2-TQA effectively improves TQA models by up to 5% on in-domain datasets and 2.4% on out-of-domain datasets with only 8,000 training instances. Furthermore, models enhanced with p2-TQA achieve competitive results against larger, more complex state-of-the-art TQA systems, while maintaining up to five times higher efficiency.