Nikos Karayannidis

46.3DBMay 27

Grain Theory: Type-Level Granularity Correctness in Data Pipelines

Nikos Karayannidis

Data transformation correctness is a fundamental challenge in data engineering: how can we verify that pipelines produce correct results before executing on production data? Existing practice relies on iterative testing over materialized data. A common cause of errors is the absence of formal reasoning about grain -- the level of detail of data -- so transformations inadvertently change granularity, yielding pathologies like fan traps (metric duplication) and chasm traps (data loss). We introduce grain theory, a type-theoretic framework that elevates grain to a composable property of any algebraic data type. It has two phases. First, a denotation of data: grain itself, defined by irreducibility and isomorphism, with no reference to functional dependencies; three grain relations forming a bounded lattice whose axioms recover Armstrong's on product types; the entity key as a derived grain; and grain-determined behavioral classes -- together the type-level triple (G[R], EK[R], BC[R]). Second, a denotation of transformations: every transformation $h$ has a grain lift $φ(h)$. For collections of product types under the relational algebra we prove an equi-join grain inference theorem and present CalcG, a decidable algorithm that composes grain lifts across a pipeline DAG. The central theorem -- the grain homomorphism -- ties the phases together: grain projection commutes with transformation, and grain lifts compose ($φ(h_2 \circ h_1) = φ(h_2) \circ φ(h_1)$). Grain-correctness is therefore verifiable at design time, before any code or query runs. As corollaries, fan traps emerge as schema-detectable grain-relation violations; chasm traps localize to a specific ordering-chain pattern; and behavioral-class violations, such as point-in-time queries on the wrong collection type, become compile-time type errors. All theorems are mechanically verified in Lean 4.

CRJun 2, 2021

Babel Fees via Limited Liabilities

Manuel M. T. Chakravarty, Nikos Karayannidis, Aggelos Kiayias et al.

Custom currencies (ERC-20) on Ethereum are wildly popular, but they are second class to the primary currency Ether. Custom currencies are more complex and more expensive to handle than the primary currency as their accounting is not natively performed by the underlying ledger, but instead in user-defined contract code. Furthermore, and quite importantly, transaction fees can only be paid in Ether. In this paper, we focus on being able to pay transaction fees in custom currencies. We achieve this by way of a mechanism permitting short term liabilities to pay transaction fees in conjunction with offers of custom currencies to compensate for those liabilities. This enables block producers to accept custom currencies in exchange for settling liabilities of transactions that they process. We present formal ledger rules to handle liabilities together with the concept of babel fees to pay transaction fees in custom currencies. We also discuss how clients can determine what fees they have to pay, and we present a solution to the knapsack problem variant that block producers have to solve in the presence of babel fees to optimise their profits.

Nikos Karayannidis

2 Papers