CLaaS: Continual learning as a service for sample efficient online learning
For practitioners deploying LLM agents in dynamic environments, CLaaS offers a practical system for continual learning without environment resets.
CLaaS enables LLM agents to adapt online from a single stream of experiences, achieving superior forward transfer and less forgetting than in-context learning on an adversarial task, with replay critical for sample efficiency.
Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scenario, as real-world environments cannot be trivially reset. To this end, we investigate an experiential and online continual learning setting in which agents learn from a stream of scenarios. We propose continual learning as-a-service (CLaaS), a system which enables agents to improve during deployment, abstracted behind a chat API. To increase sample efficiency, CLaaS stores rollouts in an experience replay buffer for gradient reuse during asynchronous training. We evaluate CLaaS on an adversarial task, demonstrating that parametric updates lead to superior forward transfer and less forgetting than in-context learning, with replay being a critical choice for sample efficiency.