CLNov 24, 2024

Exploring Performance Contrasts in TableQA: Step-by-Step Reasoning Boosts Bigger Language Models, Limits Smaller Language Models

Haoyan Yang, Yixuan Wang, Keyue Tong, Hongjin Zhu, Yuanxin Zhang

arXiv:2411.16002v11.0h-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of optimizing reasoning methods for different model sizes in table-based question answering, but it is incremental as it builds on existing prompting techniques.

The paper tackled performance differences in TableQA by proposing a step-by-step reasoning method, finding that it improved accuracy by 7.8% for larger models like Llama-3-70B but decreased performance by 11% for smaller models like Llama-2-7B.

This paper proposes a detailed prompting flow, termed Table-Logic, to investigate the performance contrasts between bigger and smaller language models (LMs) utilizing step-by-step reasoning methods in the TableQA task. The method processes tasks by sequentially identifying critical columns and rows given question and table with its structure, determining necessary aggregations, calculations, or comparisons, and finally inferring the results to generate a precise prediction. By deploying this method, we observe a 7.8% accuracy improvement in bigger LMs like Llama-3-70B compared to the vanilla on HybridQA, while smaller LMs like Llama-2-7B shows an 11% performance decline. We empirically investigate the potential causes of performance contrasts by exploring the capabilities of bigger and smaller LMs from various dimensions in TableQA task. Our findings highlight the limitations of the step-by-step reasoning method in small models and provide potential insights for making improvements.

View on arXiv PDF Code

Similar