CR AI LGMar 5, 2021

Efficient Encrypted Inference on Ensembles of Decision Trees

Kanthi Sarpatwar, Karthik Nandakumar, Nalini Ratha, James Rayfield, Karthikeyan Shanmugam, Sharath Pankanti, Roman Vaculin

arXiv:2103.03411v18.82 citations

Originality Incremental advance

AI Analysis

This work addresses data privacy concerns for sensitive personal data in cloud-based ML services by enabling efficient encrypted inference, though it is incremental as it builds on existing homomorphic encryption and knowledge transfer methods.

The paper tackles the problem of performing accurate and efficient encrypted inference on decision tree ensembles while maintaining data privacy, achieving a system that is approximately three orders of magnitude faster than standard approaches with amortized inference times in milliseconds.

Data privacy concerns often prevent the use of cloud-based machine learning services for sensitive personal data. While homomorphic encryption (HE) offers a potential solution by enabling computations on encrypted data, the challenge is to obtain accurate machine learning models that work within the multiplicative depth constraints of a leveled HE scheme. Existing approaches for encrypted inference either make ad-hoc simplifications to a pre-trained model (e.g., replace hard comparisons in a decision tree with soft comparators) at the cost of accuracy or directly train a new depth-constrained model using the original training set. In this work, we propose a framework to transfer knowledge extracted by complex decision tree ensembles to shallow neural networks (referred to as DTNets) that are highly conducive to encrypted inference. Our approach minimizes the accuracy loss by searching for the best DTNet architecture that operates within the given depth constraints and training this DTNet using only synthetic data sampled from the training data distribution. Extensive experiments on real-world datasets demonstrate that these characteristics are critical in ensuring that DTNet accuracy approaches that of the original tree ensemble. Our system is highly scalable and can perform efficient inference on batched encrypted (134 bits of security) data with amortized time in milliseconds. This is approximately three orders of magnitude faster than the standard approach of applying soft comparison at the internal nodes of the ensemble trees.

View on arXiv PDF

Similar