CVAIApr 29, 2024

Machine Unlearning for Document Classification

arXiv:2404.19031v15 citationsh-index: 31Has CodeICDAR
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users of document analysis services by allowing removal of personal data from models, though it is an incremental step as it applies an existing unlearning concept to a new domain.

The paper tackles the problem of enabling AI models to forget specific data for privacy compliance in document classification, introducing a machine unlearning method that works with limited training data on a remote server.

Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a well-trained model and possesses only a small portion of training data. This setup is designed for efficient forgetting manipulation. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications. Our code is publicly available at \url{https://github.com/leitro/MachineUnlearning-DocClassification}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes