LGNIMay 9, 2022

Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution Internet Traffic Prediction

arXiv:2205.04333v14 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of out-of-distribution generalization for network traffic prediction, which is crucial for proactive network management, but it is incremental as it builds on existing ensemble methods with a decomposition technique.

The paper tackled the problem of predicting internet traffic under distribution shifts by proposing a hybrid model that integrates wavelet decomposition with ensemble methods, achieving a 1% accuracy improvement for in-distribution data and reducing the performance gap for out-of-distribution scenarios, though performance still dropped significantly with distribution shifts.

Efficient prediction of internet traffic is essential for ensuring proactive management of computer networks. Nowadays, machine learning approaches show promising performance in modeling real-world complex traffic. However, most existing works assumed that model training and evaluation data came from identical distribution. But in practice, there is a high probability that the model will deal with data from a slightly or entirely unknown distribution in the deployment phase. This paper investigated and evaluated machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Stochastic Gradient Descent, Gradient Boosting Regressor, CatBoost Regressor, and their stacked ensemble model using data from both identical and out-of distribution. Also, we proposed a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction as standalone models were unable to generalize very well. Our experimental results show the best performance of the standalone ensemble model with an accuracy of 96.4%, while the hybrid ensemble model improved it by 1% for in-distribution data. But its performance dropped significantly when tested with three different datasets having a distribution shift than the training set. However, our proposed hybrid model considerably reduces the performance gap between identical and out-of-distribution evaluation compared with the standalone model, indicating the decomposition technique's effectiveness in the case of out-of-distribution generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes