DBMar 16

SIMD-PAC-DB: Pretty Performant PAC Privacy

Ilaria Battiston, Dandan Yuan, Xiaochen Zhu, Peter Boncz

arXiv:2603.1502336.81 citationsh-index: 4Has Code

AI Analysis

This work incrementally improves the practicality of privacy-aware data systems for database users by enhancing performance and ease of use.

The paper tackled the inefficiency of the PAC-DB privacy model by developing SIMD-PAC-DB, which reduces the required queries from 128 stochastic executions to just one, achieving up to 40x faster performance than scalar methods.

This work presents a highly optimized implementation of PAC-DB, a recent and promising database privacy model. We prove that our SIMD-PAC-DB can compute the same privatized answer with just a single query, instead of the 128 stochastic executions against different 50% database sub-samples needed by the original PAC-DB. Our key insight is that every bit of a hashed primary key can be seen to represent membership of such a sub-sample. We present new algorithms for approximate computation of stochastic aggregates based on these hashes, which, thanks to their SIMD-friendliness, run up to 40x faster than scalar equivalents. We release an open-source DuckDB community extension which includes a rewriter that PAC-privatizes arbitrary SQL queries. Our experiments on TPC-H, Clickbench, and SQLStorm evaluate thousands of queries in terms of performance and utility, significantly advancing the ease of use and functionality of privacy-aware data systems in practice.

View on arXiv PDF Code

Similar