CVLGMar 10, 2023

Feature Unlearning for Pre-trained GANs and VAEs

arXiv:2303.05699v423 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the need for privacy and control in generative models by enabling targeted feature removal, though it is incremental as it builds on existing unlearning methods.

The paper tackles the problem of unlearning specific features, like hairstyle from facial images, from pre-trained GANs and VAEs, achieving successful removal while maintaining model fidelity and showing improved robustness against adversarial attacks.

We tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pre-trained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we collect randomly generated images that contain the target features. We then identify a latent representation corresponding to the target feature and then use the representation to fine-tune the pre-trained model. Through experiments on MNIST, CelebA, and FFHQ datasets, we show that target features are successfully removed while keeping the fidelity of the original models. Further experiments with an adversarial attack show that the unlearned model is more robust under the presence of malicious parties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes