Joint Optimization of Neural Autoregressors via Scoring rules
This addresses a scalability problem for researchers using neural autoregressors in multivariate distributional regression, though it appears incremental as it builds on existing TabPFN methods.
The paper tackles the challenge of extending non-parametric distributional regression methods like TabPFN to multivariate settings, where naive discretization leads to exponential complexity and overfitting in low-data regimes, and proposes a joint optimization approach to address this.
Non-parametric distributional regression has achieved significant milestones in recent years. Among these, the Tabular Prior-Data Fitted Network (TabPFN) has demonstrated state-of-the-art performance on various benchmarks. However, a challenge remains in extending these grid-based approaches to a truly multivariate setting. In a naive non-parametric discretization with $N$ bins per dimension, the complexity of an explicit joint grid scales exponentially and the paramer count of the neural networks rise sharply. This scaling is particularly detrimental in low-data regimes, as the final projection layer would require many parameters, leading to severe overfitting and intractability.