CLApr 14, 2025

DataPuzzle: Breaking Free from the Hallucinated Promise of LLMs in Data Analysis

Zhengxuan Zhang, Zhuowen Liang, Yin Wu, Teng Lin, Yuyu Luo, Nan Tang

arXiv:2504.10036v28.34 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the issue of trust and auditability in LLM-driven analytics for users relying on data analysis, though it is incremental as it builds on existing agent-based ideas.

The paper tackles the problem of LLMs producing brittle and unverifiable outputs in data analysis by proposing a shift from monolithic 'Prompt-to-Answer' approaches to modular, agent-based workflows, resulting in a conceptual framework called DataPuzzle that aims to enable transparent and accountable analysis.

Large language models (LLMs) are increasingly applied to multi-modal data analysis -- not necessarily because they offer the most precise answers, but because they provide fluent, flexible interfaces for interpreting complex inputs. Yet this fluency often conceals a deeper structural failure: the prevailing ``Prompt-to-Answer'' paradigm treats LLMs as black-box analysts, collapsing evidence, reasoning, and conclusions into a single, opaque response. The result is brittle, unverifiable, and frequently misleading. We argue for a fundamental shift: from generation to structured extraction, from monolithic prompts to modular, agent-based workflows. LLMs should not serve as oracles, but as collaborators -- specialized in tasks like extraction, translation, and linkage -- embedded within transparent workflows that enable step-by-step reasoning and verification. We propose DataPuzzle, a conceptual multi-agent framework that decomposes complex questions, structures information into interpretable forms (e.g. tables, graphs), and coordinates agent roles to support transparent and verifiable analysis. This framework serves as an aspirational blueprint for restoring visibility and control in LLM-driven analytics -- transforming opaque answers into traceable processes, and brittle fluency into accountable insight. This is not a marginal refinement; it is a call to reimagine how we build trustworthy, auditable analytic systems in the era of large language models. Structure is not a constraint -- it is the path to clarity.

View on arXiv PDF

Similar