Knowledge-guided Continual Learning for Behavioral Analytics Systems
This work addresses data drift and forgetting issues in online platforms for tasks like hate speech detection, but it is incremental as it builds on existing replay-based continual learning methods.
The paper tackles the problem of catastrophic forgetting in continual learning for behavioral analytics systems by proposing a novel augmentation-based approach that incorporates external knowledge into replay-based methods, demonstrating improved performance over baseline replay-based approaches on three deviant behavior classification datasets.
User behavior on online platforms is evolving, reflecting real-world changes in how people post, whether it's helpful messages or hate speech. Models that learn to capture this content can experience a decrease in performance over time due to data drift, which can lead to ineffective behavioral analytics systems. However, fine-tuning such a model over time with new data can be detrimental due to catastrophic forgetting. Replay-based approaches in continual learning offer a simple yet efficient method to update such models, minimizing forgetting by maintaining a buffer of important training instances from past learned tasks. However, the main limitation of this approach is the fixed size of the buffer. External knowledge bases can be utilized to overcome this limitation through data augmentation. We propose a novel augmentation-based approach to incorporate external knowledge in the replay-based continual learning framework. We evaluate several strategies with three datasets from prior studies related to deviant behavior classification to assess the integration of external knowledge in continual learning and demonstrate that augmentation helps outperform baseline replay-based approaches.