PLPFMay 1

SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

arXiv:2605.0114083.9h-index: 4
AI Analysis

This work addresses the problem of improving memory locality and enabling data-parallel execution for functional programs operating on recursive tree-shaped data structures, which is relevant to domains like compilers and scientific computing.

The paper introduces SoCal, a language for generating factored, multi-buffer layouts of recursive algebraic datatypes, and implements it in the Colobus compiler. The approach achieves a 1.46x geometric mean speedup on tree-processing benchmarks.

Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes