AIMay 18

Causely: A Causal Intelligence Layer for Enterprise AI A Benchmark Study on SRE and Reliability Workflows

arXiv:2605.183279.3
Predicted impact top 85% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For SRE teams using AI agents, Causely addresses the semantic-interpretation bottleneck in incident diagnosis, offering a practical solution with substantial efficiency gains.

Causely introduces a causal intelligence layer that converts raw telemetry into a structured, queryable model for AI agents in SRE workflows. In a benchmark with injected faults, causal grounding reduced mean time-to-diagnosis by 63%, token consumption by 60%, and API cost by 57%, while improving root-cause accuracy from 75% to 100%.

AI agents deployed into SRE workflows currently derive their understanding of environment state from raw observability telemetry at query time, paying a semantic-interpretation tax in tokens, latency, and inferential reliability. We propose Causely, a causal intelligence layer that maintains a structured representation of environment topology, attribute dependencies, and causal relationships that are anchroed to a ontological representation of the managed environment. Causely transforms raw telemetry into a live, queryable model providing the semantic and causal foundation AI agents require to diagnose, evaluate impact, and act safely in production. We evaluate this value proposition through a benchmark study conducted in a controlled setting with injected faults in a 24-microservice OpenTelemetry demo application. Our experiments compare four agent configurations (Claude Code, OpenAI Codex, HolmesGPT with Sonnet and Gemini backends). Experiments are run with and without access to Causely under two scenarios: an active incident and a healthy baseline. On the active-fault scenario, causal grounding reduces mean time-to-diagnosis by 63\%, mean token consumption by 60\%, and mean tool-call count by 78\%, compressing the investigation footprint by 4.8$\times$ and lowering direct API cost per run by 57\%; root-cause-diagnosis accuracy rises from 75\% to 100\%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes