CL AIMar 29, 2024

DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries

Manit Mishra, Abderrahman Braham, Charles Marsom, Bryan Chung, Gavin Griffin, Dakshesh Sidnerlikar, Chatanya Sarin, Arjun Rajaram

arXiv:2404.00188v11.02 citationsh-index: 2ICAIC

Originality Synthesis-oriented

AI Analysis

This addresses the problem of time-consuming manual data analysis for data scientists, though it is incremental by applying existing models to a new domain.

The paper evaluated GPT-3.5 as a 'Language Data Scientist' to answer zero-shot, natural language queries on datasets, finding it broadly successful in tasks like code generation and data analysis using prompt engineering techniques.

Conventional processes for analyzing datasets and extracting meaningful information are often time-consuming and laborious. Previous work has identified manual, repetitive coding and data collection as major obstacles that hinder data scientists from undertaking more nuanced labor and high-level projects. To combat this, we evaluated OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) that can extrapolate key findings, including correlations and basic information, from a given dataset. The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards, including data science code-generation based tasks involving libraries such as NumPy, Pandas, Scikit-Learn, and TensorFlow, and was broadly successful in correctly answering a given data science query related to the benchmark dataset. The LDS used various novel prompt engineering techniques to effectively answer a given question, including Chain-of-Thought reinforcement and SayCan prompt engineering. Our findings demonstrate great potential for leveraging Large Language Models for low-level, zero-shot data analysis.

View on arXiv PDF

Similar