EMNov 27, 2023
From Reactive to Proactive Volatility Modeling with Hemisphere Neural NetworksPhilippe Goulet Coulombe, Mikael Frenette, Karin Klieber
We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres. Our architecture features several key ingredients making MLE work in this context. First, the hemispheres share a common core at the entrance of the network which accommodates for various forms of time variation in the error variance. Second, we introduce a volatility emphasis constraint that breaks mean/variance indeterminacy in this class of overparametrized nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb overfitting in both conditional moments. Fourth, the algorithm utilizes standard deep learning software and thus handles large data sets - both computationally and statistically. Ergo, our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must. We evaluate point and density forecasts with an extensive out-of-sample experiment and benchmark against a suite of models ranging from classics to more modern machine learning-based offerings. In all cases, HNN fares well by consistently providing accurate mean/variance forecasts for all targets and horizons. Studying the resulting volatility paths reveals its versatility, while probabilistic forecasting evaluation metrics showcase its enviable reliability. Finally, we also demonstrate how this machinery can be merged with other structured deep learning models by revisiting Goulet Coulombe (2022)'s Neural Phillips Curve.
EMDec 17, 2024
Dual Interpretation of Machine Learning ForecastsPhilippe Goulet Coulombe, Maximilian Goebel, Karin Klieber
Machine learning predictions are typically interpreted as the sum of contributions of predictors. Yet, each out-of-sample prediction can also be expressed as a linear combination of in-sample values of the predicted variable, with weights corresponding to pairwise proximity scores between current and past economic events. While this dual route leads nowhere in some contexts (e.g., large cross-sectional datasets), it provides sparser interpretations in settings with many regressors and little training data-like macroeconomic forecasting. In this case, the sequence of contributions can be visualized as a time series, allowing analysts to explain predictions as quantifiable combinations of historical analogies. Moreover, the weights can be viewed as those of a data portfolio, inspiring new diagnostic measures such as forecast concentration, short position, and turnover. We show how weights can be retrieved seamlessly for (kernel) ridge regression, random forest, boosted trees, and neural networks. Then, we apply these tools to analyze post-pandemic forecasts of inflation, GDP growth, and recession probabilities. In all cases, the approach opens the black box from a new angle and demonstrates how machine learning models leverage history partly repeating itself.
LGApr 13, 2025
Ordinary Least Squares as an Attention MechanismPhilippe Goulet Coulombe
I show that ordinary least squares (OLS) predictions can be rewritten as the output of a restricted attention module, akin to those forming the backbone of large language models. This connection offers an alternative perspective on attention beyond the conventional information retrieval framework, making it more accessible to researchers and analysts with a background in traditional statistics. It falls into place when OLS is framed as a similarity-based method in a transformed regressor space, distinct from the standard view based on partial correlations. In fact, the OLS solution can be recast as the outcome of an alternative problem: minimizing squared prediction errors by optimizing the embedding space in which training and test vectors are compared via inner products. Rather than estimating coefficients directly, we equivalently learn optimal encoding and decoding operations for predictors. From this vantage point, OLS maps naturally onto the query-key-value structure of attention mechanisms. Building on this foundation, I discuss key elements of Transformer-style attention and draw connections to classic ideas from time series econometrics.
EMFeb 8, 2022
A Neural Phillips Curve and a Deep Output GapPhilippe Goulet Coulombe
Many problems plague empirical Phillips curves (PCs). Among them is the hurdle that the two key components, inflation expectations and the output gap, are both unobserved. Traditional remedies include proxying for the absentees or extracting them via assumptions-heavy filtering procedures. I propose an alternative route: a Hemisphere Neural Network (HNN) whose architecture yields a final layer where components can be interpreted as latent states within a Neural PC. There are benefits. First, HNN conducts the supervised estimation of nonlinearities that arise when translating a high-dimensional set of observed regressors into latent states. Second, forecasts are economically interpretable. Among other findings, the contribution of real activity to inflation appears understated in traditional PCs. In contrast, HNN captures the 2021 upswing in inflation and attributes it to a large positive output gap starting from late 2020. The unique path of HNN's gap comes from dispensing with unemployment and GDP in favor of an amalgam of nonlinearly processed alternative tightness indicators.
MLMar 2, 2021
Slow-Growing TreesPhilippe Goulet Coulombe
Random Forest's performance can be matched by a single slow-growing tree (SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits the view that CART is an extreme case of an iterative weighted least square procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests (RF) is presented. Greedy ML algorithms' outcomes can be improved using either "slow learning" or diversification. SGT applies the former to estimate a single deep tree, and Booging (bagging stochastic BT with a high learning rate) uses the latter with additive shallow trees. The performance of this tree ensemble quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression tasks.
EMMar 1, 2021
Can Machine Learning Catch the COVID-19 Recession?Philippe Goulet Coulombe, Massimiliano Marcellino, Dalibor Stevanovic
Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UK-MD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of ML-based forecasting in unprecedented times are studied in an extensive pseudo-out-of-sample exercise.
EMSep 1, 2020
Time-Varying Parameters as Ridge RegressionsPhilippe Goulet Coulombe
Time-varying parameters (TVPs) models are frequently used in economics to capture structural change. I highlight a rather underutilized fact -- that these are actually ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada using large time-varying local projections. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method.
EMAug 28, 2020
How is Machine Learning Useful for Macroeconomic Forecasting?Philippe Goulet Coulombe, Maxime Leroux, Dalibor Stevanovic et al.
We move beyond "Is Machine Learning Useful for Macroeconomic Forecasting?" by adding the "how". The current forecasting literature has focused on matching specific variables and horizons with a particularly successful algorithm. In contrast, we study the usefulness of the underlying features driving ML gains over standard macroeconometric methods. We distinguish four so-called features (nonlinearities, regularization, cross-validation and alternative loss function) and study their behavior in both the data-rich and data-poor environments. To do so, we design experiments that allow to identify the "treatment" effects of interest. We conclude that (i) nonlinearity is the true game changer for macroeconomic prediction, (ii) the standard factor model remains the best regularization, (iii) K-fold cross-validation is the best practice and (iv) the $L_2$ is preferred to the $\bar ε$-insensitive in-sample loss. The forecasting gains of nonlinear techniques are associated with high macroeconomic uncertainty, financial stress and housing bubble bursts. This suggests that Machine Learning is useful for macroeconomic forecasting by mostly capturing important nonlinearities that arise in the context of uncertainty and financial frictions.
MLAug 17, 2020
To Bag is to PrunePhilippe Goulet Coulombe
It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without any apparent consequence out-of-sample. Standard arguments, like the classic bias-variance trade-off or double descent, cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent "true" tree. More generally, randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So there is no need to tune the stopping point. By construction, novel variants of Boosting and MARS are also eligible for automatic tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts -- or better.
EMAug 4, 2020
Macroeconomic Data Transformations MatterPhilippe Goulet Coulombe, Maxime Leroux, Dalibor Stevanovic et al.
In a low-dimensional linear regression setup, considering linear transformations/combinations of predictors does not alter predictions. However, when the forecasting technology either uses shrinkage or is nonlinear, it does. This is precisely the fabric of the machine learning (ML) macroeconomic forecasting environment. Pre-processing of the data translates to an alteration of the regularization -- explicit or implicit -- embedded in ML algorithms. We review old transformations and propose new ones, then empirically evaluate their merits in a substantial pseudo-out-sample exercise. It is found that traditional factors should almost always be included as predictors and moving average rotations of the data can provide important gains for various forecasting targets. Also, we note that while predicting directly the average growth rate is equivalent to averaging separate horizon forecasts when using OLS-based techniques, the latter can substantially improve on the former when regularization and/or nonparametric nonlinearities are involved.
EMJun 23, 2020
The Macroeconomy as a Random ForestPhilippe Goulet Coulombe
I develop Macroeconomic Random Forest (MRF), an algorithm adapting the canonical Machine Learning (ML) tool to flexibly model evolving parameters in a linear macro equation. Its main output, Generalized Time-Varying Parameters (GTVPs), is a versatile device nesting many popular nonlinearities (threshold/switching, smooth transition, structural breaks/change) and allowing for sophisticated new ones. The approach delivers clear forecasting gains over numerous alternatives, predicts the 2008 drastic rise in unemployment, and performs well for inflation. Unlike most ML-based methods, MRF is directly interpretable -- via its GTVPs. For instance, the successful unemployment forecast is due to the influence of forward-looking variables (e.g., term spreads, housing starts) nearly doubling before every recession. Interestingly, the Phillips curve has indeed flattened, and its might is highly cyclical.