CRMar 16

Unsupervised Cross-Protocol Anomaly Analysis in Mobile Core Networks via Multi-Embedding Models Consensus

arXiv:2603.1534411.8h-index: 2
AI Analysis

This addresses security and misconfiguration issues in mobile networks where labeled attack data is scarce, offering a practical tool for prioritizing inspections, though it is incremental as it builds on existing unsupervised and embedding techniques.

The paper tackles the problem of detecting cross-protocol anomalies in mobile core networks (SS7, Diameter, GTP) without labeled data by using unsupervised methods on fused representations, resulting in a consensus score from multiple embedding models that effectively identifies synthetic anomalies with high odds ratios and clear separation in embedding space.

Mobile core networks rely on several signalling protocols in parallel, such as SS7, Diameter, and GTP, so many security-relevant problems become visible only when their interactions are analyzed jointly. At the same time, labeled examples of real attacks and cross-protocol misconfigurations are scarce, which complicates supervised detection. We therefore study unsupervised cross-protocol anomaly analysis on fused representations that combine SS7, Diameter, and GTP signalling. For each subscriber, we aggregate messages into per-minute fused records, serialize each record as text, embed it with several models, and apply unsupervised anomaly detection. We then assign each record a consensus score equal to the number of embedding models that flag it as anomalous. For evaluation, we generate cross-protocol-plausible synthetic anomalies by swapping one field group at a time between pairs of records, preserving per-message validity while making the fused view contradictory. On 219,294 fused records, 44.15% are flagged by at least one model, but only 0.97% reach full agreement across all six. Higher consensus is strongly associated with synthetic records, where for k=1-4 the odds that a flagged record is synthetic are hundreds of times greater than for original records, and for k>=5 all flagged records are synthetic, with extremely small p-values. Cosine distances between synthetic and original records also increase with consensus, suggesting clearer separation in embedding space. These results support the use of multi-embedding consensus to prioritize a much smaller set of candidate cross-protocol inconsistencies for further inspection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes