CLJun 5, 2024

TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

arXiv:2406.03618v38 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in LLM performance for complex aggregative reasoning tasks, though it is incremental as it builds on existing prompting methods.

The authors tackled the problem of LLMs performing poorly on queries requiring aggregation of information across texts by introducing the TACT dataset, which evaluates reasoning and computational abilities, and found that contemporary LLMs achieve below 38% accuracy. They proposed an IE as a tool framework using few-shot prompting, which improved over existing techniques.

Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes