LGCENov 3, 2021

Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data

arXiv:2111.02062v44 citations
Originality Incremental advance
AI Analysis

This addresses a data granularity issue in point process modeling for applications like social media and epidemiology, but it is incremental as it extends an existing method to handle mixed data types.

The paper tackles the problem of modeling multivariate event streams when some dimensions have only interval-censored data (counts) instead of timestamps, by introducing the Partially Censored Multivariate Hawkes Process (PCMHP). It shows that PCMHP outperforms the Hawkes Intensity Process in predicting YouTube popularity and recovers parameters from synthetic data.

The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partially Censored Multivariate Hawkes Process (PCMHP), a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PCMHP using synthetic and real-world datasets. Firstly, we illustrate that the PCMHP can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PCMHP in predicting YouTube popularity and find that the PCMHP outperforms the popularity estimation algorithm Hawkes Intensity Process (HIP). Comparing with the fully interval-censored HIP, we show that the PCMHP improves prediction performance by accounting for point process dimensions, particularly when there exist significant cross-dimension interactions. Lastly, we leverage the PCMHP to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PCMHP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes