DBLGOSOct 22, 2024

Bauplan: zero-copy, scale-up FaaS for data pipelines

arXiv:2410.17465v16 citationsh-index: 11WOSC@Middleware
Originality Incremental advance
AI Analysis

This addresses inefficiencies in data pipeline execution for data practitioners, though it is incremental as it builds on existing FaaS concepts with domain-specific optimizations.

The paper tackled the problem of chaining functions for data pipelines in FaaS platforms, which are poorly suited for data workloads, by introducing bauplan, a novel FaaS model and runtime designed for data practitioners, achieving better performance and developer experience through data-awareness.

Chaining functions for longer workloads is a key use case for FaaS platforms in data applications. However, modern data pipelines differ significantly from typical serverless use cases (e.g., webhooks and microservices); this makes it difficult to retrofit existing pipeline frameworks due to structural constraints. In this paper, we describe these limitations in detail and introduce bauplan, a novel FaaS programming model and serverless runtime designed for data practitioners. bauplan enables users to declaratively define functional Directed Acyclic Graphs (DAGs) along with their runtime environments, which are then efficiently executed on cloud-based workers. We show that bauplan achieves both better performance and a superior developer experience for data workloads by making the trade-off of reducing generality in favor of data-awareness

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes