Yannick Limmer

LG
h-index2
3papers
4citations
Novelty55%
AI Score32

3 Papers

LGOct 30, 2023
Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals

Blanka Horvath, Anastasis Kratsios, Yannick Limmer et al.

The use of attention-based deep learning models in stochastic filtering, e.g. transformers and deep Kalman filters, has recently come into focus; however, the potential for these models to solve stochastic filtering problems remains largely unknown. The paper provides an affirmative answer to this open problem in the theoretical foundations of machine learning by showing that a class of continuous-time transformer models, called \textit{filterformers}, can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-time (possibly non-Gaussian) measurements. Our approximation guarantees hold uniformly over sufficiently regular compact subsets of continuous-time paths, where the worst-case 2-Wasserstein distance between the true optimal filter and our deep learning model quantifies the approximation error. Our construction relies on two new customizations of the standard attention mechanism: The first can losslessly adapt to the characteristics of a broad range of paths since we show that the attention mechanism implements bi-Lipschitz embeddings of sufficiently regular sets of paths into low-dimensional Euclidean spaces; thus, it incurs no ``dimension reduction error''. The latter attention mechanism is tailored to the geometry of Gaussian measures in the $2$-Wasserstein space. Our analysis relies on new stability estimates of robust optimal filters in the conditionally Gaussian setting.

CPJun 8, 2025
Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling

Hans Buehler, Blanka Horvath, Yannick Limmer et al.

This paper addresses the challenge of model uncertainty in quantitative finance, where decisions in portfolio allocation, derivative pricing, and risk management rely on estimating stochastic models from limited data. In practice, the unavailability of the true probability measure forces reliance on an empirical approximation, and even small misestimations can lead to significant deviations in decision quality. Building on the framework of Klibanoff et al. (2005), we enhance the conventional objective - whether this is expected utility in an investing context or a hedging metric - by superimposing an outer "uncertainty measure", motivated by traditional monetary risk measures, on the space of models. In scenarios where a natural model distribution is lacking or Bayesian methods are impractical, we propose an ad hoc subsampling strategy, analogous to bootstrapping in statistical finance and related to mini-batch sampling in deep learning, to approximate model uncertainty. To address the quadratic memory demands of naive implementations, we also present an adapted stochastic gradient descent algorithm that enables efficient parallelization. Through analytical, simulated, and empirical studies - including multi-period, real data and high-dimensional examples - we demonstrate that uncertainty measures outperform traditional mixture of measures strategies and our model-agnostic subsampling-based approach not only enhances robustness against model risk but also achieves performance comparable to more elaborate Bayesian methods.

LGJun 5, 2024
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Raeid Saqur, Anastasis Kratsios, Florian Krach et al.

We propose MoE-F - a formalized mechanism for combining $N$ pre-trained Large Language Models (LLMs) for online time-series prediction by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, our approach employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs N parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the N filter outputs are optimally aggregated to maximize their robust predictive power, and this update is computed efficiently via a closed-form expression, generating our ensemble predictor. Our contributions are: **(I)** the MoE-F plug-and-play filtering harness algorithm, **(II)** theoretical optimality guarantees of the proposed filtering-based gating algorithm (via optimality guarantees for its parallel Bayesian filtering and its robust aggregation steps), and **(III)** empirical evaluation and ablative results using state-of-the-art foundational and MoE LLMs on a real-world __Financial Market Movement__ task where MoE-F attains a remarkable 17\% absolute and 48.5\% relative F1 measure improvement over the next best performing individual LLM expert predicting short-horizon market movement based on streaming news. Further, we provide empirical evidence of substantial performance gains in applying MoE-F over specialized models in the long-horizon time-series forecasting domain.