LGMar 7, 2022
Multivariate Time Series Forecasting with Latent Graph InferenceVictor Garcia Satorras, Syama Sundar Rangapuram, Tim Januschowski
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate methods. Our approach allows to trade-off accuracy and computational efficiency gradually via offering on one extreme inference of a potentially fully-connected graph or on another extreme a bipartite graph. In the potentially fully-connected case we consider all pair-wise interactions among time-series which yields the best forecasting accuracy. Conversely, the bipartite case leverages the dependency structure by inter-communicating the N time series through a small set of K auxiliary nodes that we introduce. This reduces the time and memory complexity w.r.t. previous graph inference methods from O(N^2) to O(NK) with a small trade-off in accuracy. We demonstrate the effectiveness of our model in a variety of datasets where both of its variants perform better or very competitively to previous graph inference methods in terms of forecasting accuracy and time efficiency.
LGFeb 23, 2023
Adaptive Sampling for Probabilistic Forecasting under Distribution ShiftLuca Masserano, Syama Sundar Rangapuram, Shubham Kapoor et al.
The world is not static: This causes real-world time series to change over time through external, and potentially disruptive, events such as macroeconomic cycles or the COVID-19 pandemic. We present an adaptive sampling strategy that selects the part of the time series history that is relevant for forecasting. We achieve this by learning a discrete distribution over relevant time steps by Bayesian optimization. We instantiate this idea with a two-step method that is pre-trained with uniform sampling and then training a lightweight adaptive architecture with adaptive sampling. We show with synthetic and real-world experiments that this method adapts to distribution shift and significantly reduces the forecasting error of the base model for three out of five datasets.
LGMar 12, 2024
Chronos: Learning the Language of Time SeriesAbdul Fatir Ansari, Lorenzo Stella, Caner Turkmen et al.
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
LGMar 15, 2025
ChronosX: Adapting Pretrained Time Series Models with Exogenous VariablesSebastian Pineda Arango, Pedro Mercado, Shubham Kapoor et al.
Covariates provide valuable information on external factors that influence time series and are critical in many real-world time series forecasting tasks. For example, in retail, covariates may indicate promotions or peak dates such as holiday seasons that heavily influence demand forecasts. Recent advances in pretraining large language model architectures for time series forecasting have led to highly accurate forecasters. However, the majority of these models do not readily use covariates as they are often specific to a certain task or domain. This paper introduces a new method to incorporate covariates into pretrained time series forecasting models. Our proposed approach incorporates covariate information into pretrained forecasting models through modular blocks that inject past and future covariate information, without necessarily modifying the pretrained model in consideration. In order to evaluate our approach, we introduce a benchmark composed of 32 different synthetic datasets with varying dynamics to evaluate the effectivity of forecasting models with covariates. Extensive evaluations on both synthetic and real datasets show that our approach effectively incorporates covariate information into pretrained models, outperforming existing baselines.
LGDec 22, 2023
Deep Non-Parametric Time Series ForecasterSyama Sundar Rangapuram, Jan Gasthaus, Lorenzo Stella et al.
This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.
LGJun 3, 2025
Zero-Shot Time Series Forecasting with Covariates via In-Context LearningAndreas Auer, Raghul Parthipan, Pedro Mercado et al.
Pretrained time series models, capable of zero-shot forecasting, have demonstrated significant potential in enhancing both the performance and accessibility of time series forecasting. However, existing pretrained models either do not support covariates or fail to incorporate them effectively. We introduce COSMIC, a zero-shot forecasting model that utilizes covariates via in-context learning. To address the challenge of data scarcity, we propose Informative Covariate Augmentation, which enables the training of COSMIC without requiring any datasets that include covariates. COSMIC achieves state-of-the-art performance in zero-shot forecasting, both with and without covariates. Our quantitative and qualitative analysis demonstrates that COSMIC effectively leverages covariates in zero-shot forecasting.
LGOct 17, 2025
Chronos-2: From Univariate to Universal ForecastingAbdul Fatir Ansari, Oleksandr Shchur, Jaris Küken et al. · cmu
Pretrained time series models have enabled inference-only forecasting systems that produce accurate predictions without task-specific training. However, existing approaches largely focus on univariate forecasting, limiting their applicability in real-world scenarios where multivariate data and covariates play a crucial role. We present Chronos-2, a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner. Chronos-2 employs a group attention mechanism that facilitates in-context learning (ICL) through efficient information sharing across multiple time series within a group, which may represent sets of related series, variates of a multivariate series, or targets and covariates in a forecasting task. These general capabilities are achieved through training on synthetic datasets that impose diverse multivariate structures on univariate series. Chronos-2 delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II. On fev-bench, which emphasizes multivariate and covariate-informed forecasting, Chronos-2's universal ICL capabilities lead to substantial improvements over existing models. On tasks involving covariates, it consistently outperforms baselines by a wide margin. Case studies in the energy and retail domains further highlight its practical advantages. The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used "as is" in real-world forecasting pipelines.
LGOct 25, 2021
Neural Flows: Efficient Alternative to Neural ODEsMarin Biloš, Johanna Sommer, Syama Sundar Rangapuram et al.
Neural ordinary differential equations describe how values change in time. This is the reason why they gained importance in modeling sequential data, especially when the observations are made at irregular intervals. In this paper we propose an alternative by directly modeling the solution curves - the flow of an ODE - with a neural network. This immediately eliminates the need for expensive numerical solvers while still maintaining the modeling capability of neural ODEs. We propose several flow architectures suitable for different applications by establishing precise conditions on when a function defines a valid flow. Apart from computational efficiency, we also provide empirical evidence of favorable generalization performance via applications in time series modeling, forecasting, and density estimation.
LGApr 21, 2020
Deep Learning for Time Series Forecasting: Tutorial and Literature SurveyKonstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert et al.
Deep learning based forecasting methods have become the methods of choice in many applications of time series prediction or forecasting often outperforming other approaches. Consequently, over the last years, these methods are now ubiquitous in large-scale industrial forecasting applications and have consistently ranked among the best entries in forecasting competitions (e.g., M4 and M5). This practical success has further increased the academic interest to understand and improve deep forecasting methods. In this article we provide an introduction and overview of the field: We present important building blocks for deep forecasting in some depth; using these building blocks, we then survey the breadth of the recent deep forecasting literature.
MEMay 2, 2016
Methods for Sparse and Low-Rank Recovery under Simplex ConstraintsPing Li, Syama Sundar Rangapuram, Martin Slawski
The de-facto standard approach of promoting sparsity by means of $\ell_1$-regularization becomes ineffective in the presence of simplex constraints, i.e.,~the target is known to have non-negative entries summing up to a given constant. The situation is analogous for the use of nuclear norm regularization for low-rank recovery of Hermitian positive semidefinite matrices with given trace. In the present paper, we discuss several strategies to deal with this situation, from simple to more complex. As a starting point, we consider empirical risk minimization (ERM). It follows from existing theory that ERM enjoys better theoretical properties w.r.t.~prediction and $\ell_2$-estimation error than $\ell_1$-regularization. In light of this, we argue that ERM combined with a subsequent sparsification step like thresholding is superior to the heuristic of using $\ell_1$-regularization after dropping the sum constraint and subsequent normalization. At the next level, we show that any sparsity-promoting regularizer under simplex constraints cannot be convex. A novel sparsity-promoting regularization scheme based on the inverse or negative of the squared $\ell_2$-norm is proposed, which avoids shortcomings of various alternative methods from the literature. Our approach naturally extends to Hermitian positive semidefinite matrices with given trace. Numerical studies concerning compressed sensing, sparse mixture density estimation, portfolio optimization and quantum state tomography are used to illustrate the key points of the paper.
MLMay 24, 2015
Constrained 1-Spectral ClusteringSyama Sundar Rangapuram, Matthias Hein
An important form of prior information in clustering comes in form of cannot-link and must-link constraints. We present a generalization of the popular spectral clustering technique which integrates such constraints. Motivated by the recently proposed $1$-spectral clustering for the unconstrained problem, our method is based on a tight relaxation of the constrained normalized cut into a continuous optimization problem. Opposite to all other methods which have been suggested for constrained spectral clustering, we can always guarantee to satisfy all constraints. Moreover, our soft formulation allows to optimize a trade-off between normalized cut and the number of violated constraints. An efficient implementation is provided which scales to large datasets. We outperform consistently all other proposed methods in the experiments.
MLMay 24, 2015
Tight Continuous Relaxation of the Balanced $k$-Cut ProblemSyama Sundar Rangapuram, Pramod Kaushik Mudrakarta, Matthias Hein
Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced $k$-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced $k$-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the difficult sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method outperforms all existing approaches for ratio cut and other balanced $k$-cut criteria.
MLDec 18, 2013
The Total Variation on Hypergraphs - Learning on Hypergraphs RevisitedMatthias Hein, Simon Setzer, Leonardo Jost et al.
Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs.
MLJun 14, 2013
Constrained fractional set programs and their application in local clustering and community detectionThomas Bühler, Syama Sundar Rangapuram, Simon Setzer et al.
The (constrained) minimization of a ratio of set functions is a problem frequently occurring in clustering and community detection. As these optimization problems are typically NP-hard, one uses convex or spectral relaxations in practice. While these relaxations can be solved globally optimally, they are often too loose and thus lead to results far away from the optimum. In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem. This result leads to a flexible framework for solving constrained problems in network analysis. While a globally optimal solution for the resulting non-convex problem cannot be guaranteed, we outperform the loose convex or spectral relaxations by a large margin on constrained local clustering problems.