CRLGMay 23

CALIBURN: A Regime-Sensitivity Study of Operationally Calibrated Streaming Intrusion Detection

arXiv:2605.2469619.4
AI Analysis

For network operators needing pre-deployment alerting specifications, CALIBURN provides a regime-sensitive streaming pipeline that excels in rare-attack scenarios but shows diminishing returns in high-prevalence regimes.

CALIBURN introduces a streaming intrusion detection pipeline with operator-specifiable alerting thresholds, achieving AUC-PR 0.943 on LITNET-2020 (rare attacks), outperforming streaming baselines by 2.21x and batch methods by 4.12x, with 30% Brier score reduction via isotonic calibration.

Streaming network intrusion detection systems must process flows continuously while keeping memory bounded, but most current methods leave alerting threshold selection as a post-hoc tuning problem poorly suited to production. Operators need alerting behaviour specifiable before deployment using inputs such as false-negative cost, false-positive cost, and alerting budget. This paper presents CALIBURN, a five-component streaming alerting pipeline composed of a truncated Bayesian online change-point detector, an isotonic calibration layer mapping the change-point posterior to an empirical conditional attack probability, a cost-sensitive decision threshold derived from operator-specified misclassification costs, a Conformal Risk Control wrapper that converts an alert-budget specification into a within-window valid threshold under exchangeability, and a multi-window burn-rate alerting layer adapted from Site Reliability Engineering practice. Rather than claiming uniform dominance, we present CALIBURN as a regime-sensitivity study, evaluating the pipeline across three attack-prevalence regimes: LITNET-2020 at 5.2 percent, CICIDS2017 at 22.06 percent, and UNSW-NB15 at 64 percent. In the rare-attack regime, CALIBURN achieves AUC-PR 0.943 on LITNET-2020, outperforming the best streaming baseline by 2.21x and the best batch reference by 4.12x; isotonic calibration reduces Brier score by 30 percent. In the moderate-prevalence regime, CALIBURN remains the strongest streaming method on CICIDS2017 but is exceeded by batch density methods. In the high-prevalence regime, all streaming methods approach the prevalence floor. We further identify two distinct CRC-collapse mechanisms driving the alert rule to degeneracy at small alpha, treating both as operational guidance for practitioners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes