Real World Time Series Benchmark Datasets with Distribution Shifts: Global Crude Oil Price and Volatility
This provides a real-world benchmark for continual learning in finance, addressing a domain-specific deficit, but it is incremental as it focuses on data creation rather than novel methods.
The authors tackled the scarcity of task-labeled time-series benchmarks in finance by creating COB, a dataset with 30 years of crude oil prices and volatility proxies that exhibit distribution shifts, and showed that including their generated task labels universally improves performance on four continual learning algorithms across multiple forecasting horizons.
The scarcity of task-labeled time-series benchmarks in the financial domain hinders progress in continual learning. Addressing this deficit would foster innovation in this area. Therefore, we present COB, Crude Oil Benchmark datasets. COB includes 30 years of asset prices that exhibit significant distribution shifts and optimally generates corresponding task (i.e., regime) labels based on these distribution shifts for the three most important crude oils in the world. Our contributions include creating real-world benchmark datasets by transforming asset price data into volatility proxies, fitting models using expectation-maximization (EM), generating contextual task labels that align with real-world events, and providing these labels as well as the general algorithm to the public. We show that the inclusion of these task labels universally improves performance on four continual learning algorithms, some state-of-the-art, over multiple forecasting horizons. We hope these benchmarks accelerate research in handling distribution shifts in real-world data, especially due to the global importance of the assets considered. We've made the (1) raw price data, (2) task labels generated by our approach, (3) and code for our algorithm available at https://oilpricebenchmarks.github.io.