CRDBMay 20

Polars inside Intel SGX2 Enclaves: An Empirical Study of Confidential Analytical Query Processing

arXiv:2605.2179749.7
AI Analysis

For practitioners and researchers in confidential analytics, this provides the first empirical study of an Arrow-native engine in SGX2, revealing that load-path amplification and API-level optimization dominate performance.

This paper evaluates the Arrow-native DataFrame engine Polars inside Intel SGX2 enclaves using TPC-H SF30, finding end-to-end overhead of 1.49-1.56×, with query-only overhead declining from 1.51-1.52× to 1.43-1.44× and table-loading overhead rising from 2.27× to 4.07×. Lazy execution is 2.25-2.27× faster than eager, which fails at 41 GB.

Trusted Execution Environments (TEEs) have renewed interest in confidential analytics, but most prior evaluations focus on SQL database engines or earlier SGX generations. This paper studies an Arrow-native DataFrame engine, Polars, running inside Intel SGX2 enclaves via Gramine on TPC-H SF30 with Azure Blob Storage. We report both the standard TPC-H power score and a query-only variant that removes table-loading time in order to separate compute overhead from data-ingestion overhead. Across four dataset-width configurations (approximately 22-73 GB), end-to-end overhead remains nearly constant at 1.49-1.56$\times$, but this composite metric obscures two distinct behaviors: query-only overhead declines from 1.51-1.52$\times$ to 1.43-1.44$\times$, whereas table-loading overhead rises from 2.27$\times$ to 4.07$\times$. We further show that overhead is not uniform across queries: for the len130 configuration, the median per-query SGX slowdown is 1.45$\times$ with a maximum of 2.57$\times$, and a small set of queries exhibits pronounced run-to-run spikes consistent with stateful EPC pressure. Finally, we compare Polars' lazy and eager APIs under the same TEE setting. Lazy execution is 2.25-2.27$\times$ faster overall, while eager execution fails with out-of-memory errors at 41 GB and above. Relative to the recent DuckDB-SGX2 study, our results suggest that SGX2 can support Arrow-native analytical processing with a similar order of security overhead, but that load-path amplification and API-level optimization are first-order determinants of end-to-end performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes