AgenticData: An Agentic Data Analytics System for Heterogeneous Data
This addresses the challenge of data analytics for users who need to analyze unstructured and structured data without expert coding, though it appears incremental as it builds on existing agent-based approaches.
The paper tackles the problem of expensive and time-consuming data analytics by introducing AgenticData, an agentic system that allows users to pose natural language questions to analyze heterogeneous data, achieving superior accuracy and significantly outperforming state-of-the-art methods on benchmarks.
Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic data analytics system that allows users to simply pose natural language (NL) questions while autonomously analyzing data sources across multiple domains, including both unstructured and structured data. First, AgenticData employs a feedback-driven planning technique that automatically converts an NL query into a semantic plan composed of relational and semantic operators. We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context and long-term knowledge. Second, we propose a semantic optimization model to refine and execute semantic plans effectively. Our system, AgenticData, has been tested using three benchmarks. Experimental results showed that AgenticData achieved superior accuracy on both easy and difficult tasks, significantly outperforming state-of-the-art methods.