Learning to Predict from Textual Data
This addresses the challenge of automated event prediction from textual data, which is incremental as it builds on existing machine learning and data mining techniques.
The paper tackles the problem of generating plausible predictions of future news events from current events, using a new methodology called the Pundit algorithm, which performs as well as non-expert humans in empirical evaluations.
Given a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.