Scalable Linear Causal Inference for Irregularly Sampled Time Series with Long Range Dependencies
This addresses a critical problem for applications in finance, physical sciences, and engineering where real-world time series data is irregularly sampled and has long-range dependencies.
The paper tackles the challenges of irregular sampling, long-range dependencies, and scalability in linear causal analysis for time series by introducing a frequency-domain estimation framework. It demonstrates accurate causal structure recovery at scale using Apache Spark on Monte Carlo simulations and high-frequency financial trading data.
Linear causal analysis is central to a wide range of important application spanning finance, the physical sciences, and engineering. Much of the existing literature in linear causal analysis operates in the time domain. Unfortunately, the direct application of time domain linear causal analysis to many real-world time series presents three critical challenges: irregular temporal sampling, long range dependencies, and scale. Moreover, real-world data is often collected at irregular time intervals across vast arrays of decentralized sensors and with long range dependencies which make naive time domain correlation estimators spurious. In this paper we present a frequency domain based estimation framework which naturally handles irregularly sampled data and long range dependencies while enabled memory and communication efficient distributed processing of time series data. By operating in the frequency domain we eliminate the need to interpolate and help mitigate the effects of long range dependencies. We implement and evaluate our new work-flow in the distributed setting using Apache Spark and demonstrate on both Monte Carlo simulations and high-frequency financial trading that we can accurately recover causal structure at scale.