CLJun 17, 2024

Satyrn: A Platform for Analytics Augmented Generation

arXiv:2406.12069v223 citations
Originality Incremental advance
AI Analysis

This addresses the limitation of retrieval augmented generation for non-textual data, enabling more accurate report generation from databases, though it is an incremental improvement over existing methods.

The paper tackles the problem of generating accurate reports from structured data by proposing analytics augmented generation (AAG), which uses data analysis to create fact sets for guiding language models, resulting in over 86% claim accuracy compared to 57% for GPT-4 Code Interpreter.

Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn, that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% of claims are accurate while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes