LG AIJun 16, 2023

Catastrophic Forgetting in the Context of Model Updates

arXiv:2306.10181v12.02 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This addresses a practical obstacle for deploying deep learning models in real-world applications where frequent updates are needed, though it is incremental as it builds on existing techniques like data rehearsal and EWC.

The paper tackles the problem of catastrophic forgetting during model updates by comparing methods to maintain performance on old data, concluding that data rehearsal combined with Elastic Weight Consolidation achieves high overall accuracy across time periods, with updates being cheaper and faster than retraining from scratch when past data is large.

A large obstacle to deploying deep learning models in practice is the process of updating models post-deployment (ideally, frequently). Deep neural networks can cost many thousands of dollars to train. When new data comes in the pipeline, you can train a new model from scratch (randomly initialized weights) on all existing data. Instead, you can take an existing model and fine-tune (continue to train) it on new data. The former is costly and slow. The latter is cheap and fast, but catastrophic forgetting generally causes the new model to 'forget' how to classify older data well. There are a plethora of complicated techniques to keep models from forgetting their past learnings. Arguably the most basic is to mix in a small amount of past data into the new data during fine-tuning: also known as 'data rehearsal'. In this paper, we compare various methods of limiting catastrophic forgetting and conclude that if you can maintain access to a portion of your past data (or tasks), data rehearsal is ideal in terms of overall accuracy across all time periods, and performs even better when combined with methods like Elastic Weight Consolidation (EWC). Especially when the amount of past data (past 'tasks') is large compared to new data, the cost of updating an existing model is far cheaper and faster than training a new model from scratch.

View on arXiv PDF

Similar