Dmitry Efimov

LG
h-index21
10papers
104citations
Novelty44%
AI Score50

10 Papers

LGMar 16
Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Andres Potapczynski, Ravi Kiran Selvam, Tatiana Konstantinova et al.

In many time series forecasting settings, the target time series is accompanied by exogenous covariates, such as promotions and prices in retail demand; temperature in energy load; calendar and holiday indicators for traffic or sales; and grid load or fuel costs in electricity pricing. Ignoring these exogenous signals can substantially degrade forecasting accuracy, particularly when they drive spikes, discontinuities, or regime and phase changes in the target series. Most current time series foundation models (e.g., Chronos, Sundial, TimesFM, TimeMoE, TimeLLM, and LagLlama) ignore exogenous covariates and make forecasts solely from the numerical time series history, thereby limiting their performance. In this paper, we develop ApolloPFN, a prior-data fitted network (PFN) that is time-aware (unlike prior PFNs) and that natively incorporates exogenous covariates (unlike prior univariate forecasters). Our design introduces two major advances: (i) a synthetic data generation procedure tailored to resolve the failure modes that arise when tabular (non-temporal) PFNs are applied to time series; and (ii) time-aware architectural modifications that embed inductive biases needed to exploit the time series context. We demonstrate that ApolloPFN achieves state-of-the-art results across benchmarks, such as M5 and electric price forecasting, that contain exogenous information.

LGJan 2
Zero-shot Forecasting by Simulation Alone

Boris N. Oreshkin, Mayank Jauhari, Ravi Kiran Selvam et al.

Zero-shot time-series forecasting holds great promise, but is still in its infancy, hindered by limited and biased data corpora, leakage-prone evaluation, and privacy and licensing constraints. Motivated by these challenges, we propose the first practical univariate time series simulation pipeline which is simultaneously fast enough for on-the-fly data generation and enables notable zero-shot forecasting performance on M-Series and GiftEval benchmarks that capture trend/seasonality/intermittency patterns, typical of industrial forecasting applications across a variety of domains. Our simulator, which we call SarSim0 (SARIMA Simulator for Zero-Shot Forecasting), is based off of a seasonal autoregressive integrated moving average (SARIMA) model as its core data source. Due to instability in the autoregressive component, naive SARIMA simulation often leads to unusable paths. Instead, we follow a three-step procedure: (1) we sample well-behaved trajectories from its characteristic polynomial stability region; (2) we introduce a superposition scheme that combines multiple paths into rich multi-seasonality traces; and (3) we add rate-based heavy-tailed noise models to capture burstiness and intermittency alongside seasonalities and trends. SarSim0 is orders of magnitude faster than kernel-based generators, and it enables training on circa 1B unique purely simulated series, generated on the fly; after which well-established neural network backbones exhibit strong zero-shot generalization, surpassing strong statistical forecasters and recent foundation baselines, while operating under strict zero-shot protocol. Notably, on GiftEval we observe a "student-beats-teacher" effect: models trained on our simulations exceed the forecasting accuracy of the AutoARIMA generating processes.

LGNov 6, 2024
$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition

Malcolm Wolff, Kin G. Olivares, Boris Oreshkin et al.

Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts. To tackle this challenge, we introduce a neural forecasting model called Split Peak Attention DEcomposition, SPADE. This model reduces the impact of PEs on subsequent forecasts by modeling forecasting as consisting of two separate tasks: one for PEs; and the other for the rest. Its architecture then uses masked convolution filters and a specialized Peak Attention module. We show SPADE's performance on a worldwide retail dataset with hundreds of millions of products. Our results reveal an overall PPE improvement of 4.5%, a 30% improvement for most affected forecasts after promotions and holidays, and an improvement in PE accuracy by 3.9%, relative to current production models.

LGDec 3, 2024
LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data

Hanyu Zhang, Chuck Arvin, Dmitry Efimov et al.

Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for clearly relevant products. To address this shortcoming, this paper introduces a novel forecast post-processor -- which we call LLMForecaster -- that fine-tunes large language models (LLMs) to incorporate unstructured semantic and contextual information and historical data to improve the forecasts from an existing demand forecasting pipeline. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.

LGSep 23, 2025
A More Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models

Kin G. Olivares, Malcolm Wolff, Tatiana Konstantinova et al.

Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs). Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance. This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets. To address these limitations, we introduce a unified reimplementation of widely-adopted neural forecasting networks, adapting them for the CFTL setup; we pre-train only on proprietary and synthetic data, being careful to prevent test leakage; and we evaluate on 15 large, diverse public forecast competition datasets. Our empirical analysis reveals that statistical models' accuracy is frequently underreported. Notably, we confirm that statistical models and their ensembles consistently outperform existing FFMs by more than 8.2% in sCRPS, and by more than 20% MASE, across datasets. However, we also find that synthetic dataset pre-training does improve the accuracy of a FFM by 7% percent.

LGOct 6, 2025
Forking-Sequences

Willa Potosnak, Malcolm Wolff, Boris Oreshkin et al.

While accuracy is a critical requirement for time series forecasting models, an equally important (yet often overlooked) desideratum is forecast stability across forecast creation dates (FCDs). Even highly accurate models can produce erratic revisions between FCDs, undermining stakeholder trust and disrupting downstream decision-making. To improve forecast stability, models like MQCNN, MQT, and SPADE employ a little-known but highly effective technique: forking-sequences. Unlike standard statistical and neural forecasting methods that treat each FCD independently, the forking-sequences method jointly encodes and decodes the entire time series across all FCDs, in a way mirroring time series cross-validation. Since forking sequences remains largely unknown in the broader neural forecasting community, in this work, we formalize the forking-sequences approach, and we make a case for its broader adoption. We demonstrate three key benefits of forking-sequences: (i) more stable and consistent gradient updates during training; (ii) reduced forecast variance through ensembling; and (iii) improved inference computational efficiency. We validate forking-sequences' benefits using 16 datasets from the M1, M3, M4, and Tourism competitions, showing improvements in forecast percentage change stability of 28.8%, 28.8%, 37.9%, and 31.3%, and 8.8%, on average, for MLP, RNN, LSTM, CNN, and Transformer-based architectures, respectively.

LGAug 22, 2025
Probabilistic Pretraining for Neural Regression

Boris N. Oreshkin, Shiv Tavker, Dmitry Efimov

Transfer learning for probabilistic regression remains underexplored. This work closes this gap by introducing NIAQUE, Neural Interpretable Any-Quantile Estimation, a new model designed for transfer learning in probabilistic regression through permutation invariance. We demonstrate that pre-training NIAQUE directly on diverse downstream regression datasets and fine-tuning it on a specific target dataset enhances performance on individual regression tasks, showcasing the positive impact of probabilistic transfer learning. Furthermore, we highlight the effectiveness of NIAQUE in Kaggle competitions against strong baselines involving tree-based models and recent neural foundation models TabPFN and TabDPT. The findings highlight NIAQUE's efficacy as a robust and scalable framework for probabilistic regression, leveraging transfer learning to enhance predictive performance.

LGJul 14, 2025
TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting

Zhiyuan Zhao, Sitan Yang, Kin G. Olivares et al.

Multi-horizon time series forecasting has many practical applications such as demand forecasting. Accurate demand prediction is critical to help make buying and inventory decisions for supply chain management of e-commerce and physical retailers, and such predictions are typically required for future horizons extending tens of weeks. This is especially challenging during high-stake sales events when demand peaks are particularly difficult to predict accurately. However, these events are important not only for managing supply chain operations but also for ensuring a seamless shopping experience for customers. To address this challenge, we propose Temporal-Aligned Transformer (TAT), a multi-horizon forecaster leveraging apriori-known context variables such as holiday and promotion events information for improving predictive performance. Our model consists of an encoder and decoder, both embedded with a novel Temporal Alignment Attention (TAA), designed to learn context-dependent alignment for peak demand forecasting. We conduct extensive empirical analysis on two large-scale proprietary datasets from a large e-commerce retailer. We demonstrate that TAT brings up to 30% accuracy improvement on peak demand forecasting while maintaining competitive overall performance compared to other state-of-the-art methods.

RMDec 30, 2020
Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data

Jillian M. Clements, Di Xu, Nooshin Yousefi et al.

Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However, these gains begin to plateau without the addition of expensive new data sources or highly engineered features. In this paper, we present our attempts to create a novel approach to assessing credit risk using deep learning that does not rely on new model inputs. We propose a new credit card transaction sampling technique to use with deep recurrent and causal convolution-based neural networks that exploits long historical sequences of financial data without costly resource requirements. We show that our sequential deep learning approach using a temporal convolutional network outperformed the benchmark non-sequential tree-based model, achieving significant financial savings and earlier detection of credit risk. We also demonstrate the potential for our approach to be used in a production environment, where our sampling technique allows for sequences to be stored efficiently in memory and used for fast online learning and inference.

LGFeb 6, 2020
Using generative adversarial networks to synthesize artificial financial datasets

Dmitry Efimov, Di Xu, Luyang Kong et al.

Generative Adversarial Networks (GANs) became very popular for generation of realistically looking images. In this paper, we propose to use GANs to synthesize artificial financial data for research and benchmarking purposes. We test this approach on three American Express datasets, and show that properly trained GANs can replicate these datasets with high fidelity. For our experiments, we define a novel type of GAN, and suggest methods for data preprocessing that allow good training and testing performance of GANs. We also discuss methods for evaluating the quality of generated data, and their comparison with the original real data.