SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types
This work addresses the problem of scientific question answering for researchers by providing a more realistic benchmark, though it is incremental as it builds on existing SQA datasets.
The authors tackled the limited reasoning types and lack of table-text integration in scientific question answering by introducing SciTaT, a benchmark with diverse reasoning types, and CaR, a baseline method that improved performance by 12.9% over other baselines.
Scientific question answering (SQA) is an important task aimed at answering questions based on papers. However, current SQA datasets have limited reasoning types and neglect the relevance between tables and text, creating a significant gap with real scenarios. To address these challenges, we propose a QA benchmark for scientific tables and text with diverse reasoning types (SciTaT). To cover more reasoning types, we summarize various reasoning types from real-world questions. To involve both tables and text, we require the questions to incorporate tables and text as much as possible. Based on SciTaT, we propose a strong baseline (CaR), which combines various reasoning methods to address different reasoning types and process tables and text at the same time. CaR brings average improvements of 12.9% over other baselines on SciTaT, validating its effectiveness. Error analysis reveals the challenges of SciTaT, such as complex numerical calculations and domain knowledge.