Accelerating Dependency Graph Learning from Heterogeneous Categorical Event Streams via Knowledge Transfer
This work addresses the domain variety problem in dependency graph learning for applications like intrusion detection, offering a novel transfer learning approach for categorical data, though it appears incremental as it builds on existing graph learning methods.
The paper tackles the problem of transferring knowledge from a well-trained dependency graph in a source domain to enhance an immature graph in a target domain, particularly for heterogeneous categorical event streams, and demonstrates that their method ACRET achieves superior intrusion detection performance with at least 20 days lead time and over 70% accuracy.
Dependency graph, as a heterogeneous graph representing the intrinsic relationships between different pairs of system entities, is essential to many data analysis applications, such as root cause diagnosis, intrusion detection, etc. Given a well-trained dependency graph from a source domain and an immature dependency graph from a target domain, how can we extract the entity and dependency knowledge from the source to enhance the target? One way is to directly apply a mature dependency graph learned from a source domain to the target domain. But due to the domain variety problem, directly using the source dependency graph often can not achieve good performance. Traditional transfer learning methods mainly focus on numerical data and are not applicable. In this paper, we propose ACRET, a knowledge transfer based model for accelerating dependency graph learning from heterogeneous categorical event streams. In particular, we first propose an entity estimation model to filter out irrelevant entities from the source domain based on entity embedding and manifold learning. Only the entities with statistically high correlations are transferred to the target domain. On the surviving entities, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. The experimental results on synthetic and real-world datasets demonstrate the effectiveness and efficiency of ACRET. We also apply ACRET to a real enterprise security system for intrusion detection. Our method is able to achieve superior detection performance at least 20 days lead lag time in advance with more than 70% accuracy.