Online Forgetting Process for Linear Regression Models
This work tackles the problem of efficiently deleting user data in online linear regression for applications requiring compliance with data privacy regulations, such as the EU's "Right To Be Forgotten."
This paper addresses the problem of online data deletion in linear regression, motivated by the "Right To Be Forgotten" regulation. They propose the FIFD-Adaptive Ridge algorithm, which uses a novel online regularization scheme to counteract the statistical inefficiency caused by data deletion, outperforming ridge regression with fixed regularization.
Motivated by the EU's "Right To Be Forgotten" regulation, we initiate a study of statistical data deletion problems where users' data are accessible only for a limited period of time. This setting is formulated as an online supervised learning task with \textit{constant memory limit}. We propose a deletion-aware algorithm \texttt{FIFD-OLS} for the low dimensional case, and witness a catastrophic rank swinging phenomenon due to the data deletion operation, which leads to statistical inefficiency. As a remedy, we propose the \texttt{FIFD-Adaptive Ridge} algorithm with a novel online regularization scheme, that effectively offsets the uncertainty from deletion. In theory, we provide the cumulative regret upper bound for both online forgetting algorithms. In the experiment, we showed \texttt{FIFD-Adaptive Ridge} outperforms the ridge regression algorithm with fixed regularization level, and hopefully sheds some light on more complex statistical models.