Sampling High Throughput Data for Anomaly Detection of Data-Base Activity
This addresses data leakage threats for organizations using databases, but appears incremental as it builds on existing sampling methods.
The paper tackles the problem of detecting anomalies in high-throughput database activity by investigating risk-based sampling methods and proposing a combined sampling approach to capture more varied samples, though no concrete performance numbers are provided.
Data leakage and theft from databases is a dangerous threat to organizations. Data Security and Data Privacy protection systems (DSDP) monitor data access and usage to identify leakage or suspicious activities that should be investigated. Because of the high velocity nature of database systems, such systems audit only a portion of the vast number of transactions that take place. Anomalies are investigated by a Security Officer (SO) in order to choose the proper response. In this paper we investigate the effect of sampling methods based on the risk the transaction poses and propose a new method for "combined sampling" for capturing a more varied sample.