APAIJun 12, 2024

Classification Modeling with RNN-Based, Random Forest, and XGBoost for Imbalanced Data: A Case of Early Crash Detection in ASEAN-5 Stock Markets

arXiv:2406.07888v29 citations
Originality Synthesis-oriented
AI Analysis

This study addresses the problem of detecting rare stock market crashes for investors and analysts in ASEAN-5 countries, but it is incremental as it applies existing methods to new data with methodological adjustments.

This research tackled early crash detection in ASEAN-5 stock markets using imbalanced data, comparing RNN architectures (Simple RNN, GRU, LSTM) to Random Forest and XGBoost, and found that all RNN-based models outperformed the classic algorithms, with Simple RNN being the most superior.

This research aims to evaluate the performance of several Recurrent Neural Network (RNN) architectures including Simple RNN, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM), compared to classic algorithms such as Random Forest and XGBoost in building classification models for early crash detection in ASEAN-5 stock markets. The study is examined using imbalanced data, which is common due to the rarity of market crashes. The study analyzes daily data from 2010 to 2023 across the major stock markets of the ASEAN-5 countries, including Indonesia, Malaysia, Singapore, Thailand, and Philippines. Market crash is identified as the target variable when the major stock price indices fall below the Value at Risk (VaR) thresholds of 5%, 2.5% and 1%. predictors involving technical indicators of major local and global markets as well as commodity markets. This study includes 213 predictors with their respective lags (5, 10, 15, 22, 50, 200) and uses a time step of 7, expanding the total number of predictors to 1491. The challenge of data imbalance is addressed with SMOTE-ENN. The results show that all RNN-Based architectures outperform Random Forest and XGBoost. Among the various RNN architectures, Simple RNN stands out as the most superior, mainly due to the data characteristics that are not overly complex and focus more on short-term information. This study enhances and extends the range of phenomena observed in previous studies by incorporating variables like different geographical zones and time periods, as well as methodological adjustments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes