LGCEFeb 23, 2022

Nowcasting the Financial Time Series with Streaming Data Analytics under Apache Spark

arXiv:2202.11820v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of nowcasting for financial analysts and traders, but it is incremental as it applies existing machine learning methods to streaming data in a specific domain.

The paper tackles real-time prediction of high-frequency financial time series using a two-stage method combining chaos modeling and machine learning algorithms within Apache Spark's streaming analytics, achieving results evaluated with metrics like SMAPE and Diebold-Mariano tests on stock and Bitcoin datasets.

This paper proposes nowcasting of high-frequency financial datasets in real-time with a 5-minute interval using the streaming analytics feature of Apache Spark. The proposed 2 stage method consists of modelling chaos in the first stage and then using a sliding window approach for training with machine learning algorithms namely Lasso Regression, Ridge Regression, Generalised Linear Model, Gradient Boosting Tree and Random Forest available in the MLLib of Apache Spark in the second stage. For testing the effectiveness of the proposed methodology, 3 different datasets, of which two are stock markets namely National Stock Exchange & Bombay Stock Exchange, and finally One Bitcoin-INR conversion dataset. For evaluating the proposed methodology, we used metrics such as Symmetric Mean Absolute Percentage Error, Directional Symmetry, and Theil U Coefficient. We tested the significance of each pair of models using the Diebold Mariano (DM) test.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes