Jericho Cain

h-index1
2papers

2 Papers

1.4CRApr 29
Signal Decomposition Reveals Structure in Insider Threat Detection under Sparse Temporal Data

Hayden Beadles, Jericho Cain

Insider threat detection is difficult because malicious behavior is rare, irregular, and buried in long periods of inactivity. In enterprise audit data, most windows contain little activity, while attacks appear intermittently and range from brief events to sustained campaigns. Standard reconstruction-based models are therefore dominated by inactive regions and tend to learn baseline behavior rather than meaningful deviations. We separate activity presence from magnitude. Each window is decomposed into a binary mask indicating whether activity occurs and a value matrix capturing its intensity. A dual-channel autoencoder reconstructs both, with value loss applied only where activity is present, directing learning toward sparse structure. Using the CERT r5.2 dataset as a controlled setting, we examine how anomaly signal changes with temporal configuration. Short attacks are detected mainly through presence; longer attacks introduce a magnitude component; noise degrades magnitude reliability and shifts detection back toward presence. The balance between channels is not fixed and follows the data. At the campaign level, signal concentrates in a small number of anomalous windows. Simple aggregation that emphasizes extreme scores is sufficient to recover extended activity without explicit sequence modeling. Effective detection depends less on model complexity and more on aligning representation and objective with sparse temporal structure.

LGNov 11, 2024
Anomaly Detection in OKTA Logs using Autoencoders

Jericho Cain, Hayden Beadles, Karthik Venkatesan

Okta logs are used today to detect cybersecurity events using various rule-based models with restricted look back periods. These functions have limitations, such as a limited retrospective analysis, a predefined rule set, and susceptibility to generating false positives. To address this, we adopt unsupervised techniques, specifically employing autoencoders. To properly use an autoencoder, we need to transform and simplify the complexity of the log data we receive from our users. This transformed and filtered data is then fed into the autoencoder, and the output is evaluated.