CL AINov 16, 2025

Adaptive Focus Memory for Language Models

arXiv:2511.12712v14 citations

Originality Incremental advance

AI Analysis

This work addresses the bottleneck of fixed context windows and naive memory strategies for practitioners deploying LLMs in safety-critical multi-turn dialogues, offering a modular solution to reduce inference costs without sacrificing safety, though it is incremental in improving existing memory management approaches.

The paper tackles the problem of inefficient memory management in multi-turn dialogues for large language models by introducing Adaptive Focus Memory (AFM), which dynamically assigns fidelity levels to past messages based on relevance, recency, and importance, cutting average token usage by 66% while matching safety performance in a benchmark scenario.

Large language models (LLMs) are increasingly deployed in multi-turn dialogue settings, but their behavior is still bottlenecked by fixed context windows and naive memory strategies. Replaying the full conversation at every turn is simple but expensive, while static summarization or recency-only heuristics often erase safety-critical user details. We present Adaptive Focus Memory (AFM), a dynamic context manager that assigns each past message one of three fidelity levels -- FULL, COMPRESSED, or PLACEHOLDER -- based on semantic similarity to the current query, half-life recency weighting, and importance classification. AFM packs messages chronologically under a strict token budget, preferring high fidelity for the most relevant turns while aiming to preserve a cheap trace of the dialogue. In a safety-oriented benchmark involving a user with a severe peanut allergy planning a trip to Thailand, AFM retains the allergy across both short and medium-length conversations, matches the safety performance of naive replay, and cuts average token usage by 66% relative to a replay baseline. We release a modular Python implementation of AFM designed for OpenAI-compatible APIs and offline operation, enabling practitioners to reduce inference cost without sacrificing safety or factual continuity in the evaluated scenario.

View on arXiv PDF

Similar