Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
This work addresses privacy and bias issues in language models, offering a novel debiasing tool, though it is incremental as it builds on existing model fusion techniques.
The paper tackled the problem of reducing unwanted knowledge like biases and memorization in language models by using model fusion, demonstrating that it can effectively forget unshared knowledge while enhancing shared knowledge, with experiments showing significant reductions in bias and memorization.
Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.