Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
For developers of proactive AI agents, this work offers a practical, efficient alternative to LLM-based event processing that is deployable on-device.
Proactive agents that read user activity as text and call an LLM on every event are inefficient; the authors replace the LLM encoder with a small temporal-graph-learning model that processes structured event streams directly, achieving a mean F1 improvement of +16.7 (up to +46.0) across 14 backbones and running 4–83x faster than LLM-based triggers with a 220 MiB footprint.
Proactive agents read user activity as text and call an LLM on every event to decide whether to act. But user activity is not natively text: it is a structured event stream of (actor, verb, object, timestamp) tuples that the operating system already maintains in graph form. Rendering the structure as text and asking an LLM to recover it is a round-trip the system never had to take. We treat the always-on signal as graph updates rather than text and use a small temporal-graph-learning (TGL) model as the encoder: one forward pass yields a per-event trigger probability and a per-entity routing score, and only the downstream agent (turning a small structured handoff into a fluent user-facing sentence) is an LLM call, invoked only when the trigger fires. TGL improves F1 on each of 14 backbones (mean +16.7, up to +46.0); in trigger-architecture comparisons, one TGL checkpoint gives the strongest trigger AUCs and the most stable deployed threshold. It runs at 11.13 ms per event on a GPU server and 13.99 ms on a consumer laptop, approximately 4--7x and 12--83x faster than every single-forward LLM-as-trigger configuration tested in each regime, with an approximately 220 MiB BF16 resident footprint deployable on-device alongside the privacy-sensitive activity stream it consumes.