High-Dimensional Learning in Finance
This work addresses the reliability of machine learning predictions in finance, revealing fundamental limitations that challenge common practices, making it incremental by refining theoretical understanding rather than introducing new methods.
The paper tackles the problem of understanding when high-dimensional machine learning models achieve predictive success in finance, showing that with typical parameters, the required sample size to escape information-theoretic lower bounds exceeds 25-30 years of data, indicating observed success stems from lower-complexity artefacts.
Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data--well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.