FUTURE: Flexible Unlearning for Tree Ensemble
This addresses data privacy needs for users of tree ensembles in domains like bioinformatics and finance, offering a flexible and efficient solution, though it is incremental as it builds on existing unlearning methods.
The paper tackles the problem of enabling tree ensembles to forget sensitive data for privacy, proposing FUTURE, a gradient-based unlearning algorithm that uses probabilistic approximations to handle non-differentiability, with experiments showing significant unlearning performance on real-world datasets.
Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the \textit{right to be forgotten}, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.