DBHCJan 14

TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

arXiv:2601.09404h-index: 4
AI Analysis

For data analysts, TiInsight automates cross-domain exploratory data analysis via natural language, but the novelty is incremental as it combines existing LLM and text-to-SQL techniques.

TiInsight is an SQL-based automated exploratory data analysis system that leverages large language models to enable cross-domain data exploration using natural language queries. It was deployed in PingCAP's production environment, demonstrating robust performance across representative datasets.

The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration. However, existing methods generally lack the ability for cross-domain analysis, and the exploration of LLMs capabilities remains insufficient. This paper presents TiInsight, an SQL-based automated cross-domain exploratory data analysis system. First, TiInsight offers a user-friendly GUI enabling users to explore data using natural language queries. Second, TiInsight offers a robust cross-domain exploratory data analysis pipeline: hierarchical data context (i.e., HDC) generation, question clarification and decomposition, text-to-SQL (i.e., TiSQL), and data visualization (i.e., TiChart). Third, we have implemented and deployed TiInsight in the production environment of PingCAP and demonstrated its capabilities using representative datasets. The demo video is available at https://youtu.be/JzYFyYd-emI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes