AI DC ET LG MAOct 27, 2025

AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines

Abolfazl Younesi, Zahra Najafabadi Samani, Thomas Fahringer

arXiv:2510.23408v1h-index: 4

Originality Highly original

AI Analysis

This work addresses the challenge of efficiently creating stream processing pipelines for developers and data engineers, representing a novel method for automating a known bottleneck in data processing.

The paper tackles the problem of automating the design and deployment of data stream processing pipelines by introducing AutoStreamPipe, a framework that uses Large Language Models (LLMs) to bridge the gap between user intent and platform-specific implementations, resulting in a 6.3x reduction in development time and a 5.19x reduction in error rates compared to existing LLM code-generation methods.

Data pipelines are essential in stream processing as they enable the efficient collection, processing, and delivery of real-time data, supporting rapid data analysis. In this paper, we present AutoStreamPipe, a novel framework that employs Large Language Models (LLMs) to automate the design, generation, and deployment of stream processing pipelines. AutoStreamPipe bridges the semantic gap between high-level user intent and platform-specific implementations across distributed stream processing systems for structured multi-agent reasoning by integrating a Hypergraph of Thoughts (HGoT) as an extended version of GoT. AutoStreamPipe combines resilient execution strategies, advanced query analysis, and HGoT to deliver pipelines with good accuracy. Experimental evaluations on diverse pipelines demonstrate that AutoStreamPipe significantly reduces development time (x6.3) and error rates (x5.19), as measured by a novel Error-Free Score (EFS), compared to LLM code-generation methods.

View on arXiv PDF

Similar