Real-time Top-K Predictive Query Processing over Event Streams
It improves predictive query processing for applications like power systems and web analytics, though it is incremental by building on existing causal prediction methods.
This paper tackles the problem of predicting the top-k most likely next events from real-time event streams, addressing limitations of existing methods that assume acyclic causality and omit infrequent causal links. The proposed approach uses an event precedence model and run-time causal inference, with experiments showing the reduced search algorithm reduces runtime by 25-80% compared to exhaustive search with minimal accuracy loss.
This paper addresses the problem of predicting the k events that are most likely to occur next, over historical real-time event streams. Existing approaches to causal prediction queries have a number of limitations. First, they exhaustively search over an acyclic causal network to find the most likely k effect events; however, data from real event streams frequently reflect cyclic causality. Second, they contain conservative assumptions intended to exclude all possible non-causal links in the causal network; it leads to the omission of many less-frequent but important causal links. We overcome these limitations by proposing a novel event precedence model and a run-time causal inference mechanism. The event precedence model constructs a first order absorbing Markov chain incrementally over event streams, where an edge between two events signifies a temporal precedence relationship between them, which is a necessary condition for causality. Then, the run-time causal inference mechanism learns causal relationships dynamically during query processing. This is done by removing some of the temporal precedence relationships that do not exhibit causality in the presence of other events in the event precedence model. This paper presents two query processing algorithms -- one performs exhaustive search on the model and the other performs a more efficient reduced search with early termination. Experiments using two real datasets (cascading blackouts in power systems and web page views) verify the effectiveness of the probabilistic top-k prediction queries and the efficiency of the algorithms. Specifically, the reduced search algorithm reduced runtime, relative to exhaustive search, by 25-80% (depending on the application) with only a small reduction in accuracy.