PLApr 18

Shift schema drift left: policy-aware compile-time contracts for typed JVM and Spark pipelines

arXiv:2604.169867.0
Predicted impact top 69% in PL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For developers using typed JVM/Spark pipelines, this framework provides a compile-time mechanism to catch schema drift earlier, though it is an incremental improvement over existing typed-dataset and table-level enforcement systems.

The paper presents a Scala 3 framework that enforces schema contracts at compile time for typed JVM and Spark pipelines, reducing schema drift detection before runtime. It adds a policy-aware runtime comparator for nested-collection-optionality checks and supports structural subset semantics for backward/forward compatibility.

Schema drift in data pipelines is often caught only when a job touches real data. Typed-Dataset layers close part of this gap but require wholesale adoption; table-level enforcement systems close another part but operate at write time against a stored schema. We present a small Scala 3 framework that occupies the seam: it proves producer-to-contract structural compatibility under explicit policies at compile time, derives Spark schemas from the same contract types, and re-checks the actual DataFrame schema at the sink boundary before write. The artifact fuses the compile-time witness with a policy-aware runtime comparator that adds a nested-collection-optionality check Spark's built-in comparators omit and implements structural subset semantics for backward- and forward-compatible field sets. Evaluation covers compile-time proofs, runtime policy tests, builder-path end-to-end tests, and reproducible benchmarks on two environments. This is a narrow, honest mechanism artifact; the broader claim that compile-time structural contracts deliver measurable productivity or reliability in practice is stated as motivation and left for future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes