LGDSMay 21

Lumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in Trees

arXiv:2605.2275614.3
Predicted impact top 50% in LG · last 90 daysOriginality Highly original
AI Analysis

For practitioners using sensitive tabular data, Lumberjack provides a practical differentially private random forest that substantially improves the privacy-utility trade-off, closing the utility gap with non-private methods.

Lumberjack introduces a differentially private random forest algorithm that achieves higher utility by constructing large trees and applying privacy-preserving pruning, enabled by a novel heavy hitter detection algorithm with error O(√log h). It consistently outperforms prior DP random forest methods, establishing a new state of the art on benchmark datasets.

Random forests are widely used in fields involving sensitive tabular data, but existing approaches to enforcing differential privacy (DP) typically degrade performance to the point of impracticality. In this paper, we introduce Lumberjack, a differentially private random forest algorithm that achieves substantially higher utility by constructing large random decision trees and then applying aggressive, privacy-preserving pruning to retain only sufficiently populated nodes. A key component of our approach is a novel $(\varepsilon,δ)$-DP heavy hitter detection algorithm for hierarchical data, whose error is $O_{\varepsilon,δ}(\sqrt{\log h})$ for trees of height $h$ and may be of independent interest. This favorable scaling enables the use of significantly deeper trees than in prior work, leading to improved expressiveness under privacy constraints. Our empirical evaluation on benchmark datasets shows that Lumberjack consistently outperforms prior DP random forest methods, establishing a new state of the art. In particular, our approach yields substantial improvements in the privacy-utility trade-off for practical privacy budgets. Our findings suggest that carefully designed DP random forests can close much of the utility gap, highlighting a promising and underexplored direction for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes