CLAIHCOct 19, 2022

Revision Transformers: Instructing Language Models to Change their Values

arXiv:2210.10332v312 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses the challenge for AI developers and users in efficiently updating large language models to reflect changing values, though it is incremental as it builds on existing pre-trained models.

The authors tackled the problem of costly and difficult updates to transformer language models for correcting biases and dynamic concepts like moral values, proposing the Revision Transformer (RiT) that enables easy model updating with user interaction, demonstrating strong performance on a moral dataset with small data.

Current transformer language models (LM) are large-scale models with billions of parameters. They have been shown to provide high performances on a variety of tasks but are also prone to shortcut learning and bias. Addressing such incorrect model behavior via parameter adjustments is very costly. This is particularly problematic for updating dynamic concepts, such as moral values, which vary culturally or interpersonally. In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction. We exemplify RiT on a moral dataset and simulate user feedback demonstrating strong performance in model revision even with small data. This way, users can easily design a model regarding their preferences, paving the way for more transparent AI models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes