LGJul 21, 2024
Deep State Space Recurrent Neural Networks for Time Series ForecastingHugo Inzirillo
We explore various neural network architectures for modeling the dynamics of the cryptocurrency market. Traditional linear models often fall short in accurately capturing the unique and complex dynamics of this market. In contrast, Deep Neural Networks (DNNs) have demonstrated considerable proficiency in time series forecasting. This papers introduces novel neural network framework that blend the principles of econometric state space models with the dynamic capabilities of Recurrent Neural Networks (RNNs). We propose state space models using Long Short Term Memory (LSTM), Gated Residual Units (GRU) and Temporal Kolmogorov-Arnold Networks (TKANs). According to the results, TKANs, inspired by Kolmogorov-Arnold Networks (KANs) and LSTM, demonstrate promising outcomes.
LGSep 23, 2024
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of ExpertsHugo Inzirillo, Remi Genet
This paper introduces KAMoE, a novel Mixture of Experts (MoE) framework based on Gated Residual Kolmogorov-Arnold Networks (GRKAN). We propose GRKAN as an alternative to the traditional gating function, aiming to enhance efficiency and interpretability in MoE modeling. Through extensive experiments on digital asset markets and real estate valuation, we demonstrate that KAMoE consistently outperforms traditional MoE architectures across various tasks and model types. Our results show that GRKAN exhibits superior performance compared to standard Gating Residual Networks, particularly in LSTM-based models for sequential tasks. We also provide insights into the trade-offs between model complexity and performance gains in MoE and KAMoE architectures.
LGMay 12, 2024
TKAN: Temporal Kolmogorov-Arnold NetworksRemi Genet, Hugo Inzirillo
Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.
LGSep 20, 2022
An Attention Free Long Short-Term Memory for Time Series ForecastingHugo Inzirillo, Ludovic De Villelongue
Deep learning is playing an increasingly important role in time series analysis. We focused on time series forecasting using attention free mechanism, a more efficient framework, and proposed a new architecture for time series prediction for which linear models seem to be unable to capture the time dependence. We proposed an architecture built using attention free LSTM layers that overcome linear models for conditional variance prediction. Our findings confirm the validity of our model, which also allowed to improve the prediction capacity of a LSTM, while improving the efficiency of the learning task.
LGApr 20, 2023
An Attention Free Conditional Autoencoder For Anomaly Detection in CryptocurrenciesHugo Inzirillo, Ludovic De Villelongue
It is difficult to identify anomalies in time series, especially when there is a lot of noise. Denoising techniques can remove the noise but this technique can cause a significant loss of information. To detect anomalies in the time series we have proposed an attention free conditional autoencoder (AF-CA). We started from the autoencoder conditional model on which we added an Attention-Free LSTM layer \cite{inzirillo2022attention} in order to make the anomaly detection capacity more reliable and to increase the power of anomaly detection. We compared the results of our Attention Free Conditional Autoencoder with those of an LSTM Autoencoder and clearly improved the explanatory power of the model and therefore the detection of anomaly in noisy time series.
LGFeb 13, 2025
SigGate: Enhancing Recurrent Neural Networks with Signature-Based Gating MechanismsRémi Genet, Hugo Inzirillo
In this paper, we propose a novel approach that enhances recurrent neural networks (RNNs) by incorporating path signatures into their gating mechanisms. Our method modifies both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures by replacing their forget and reset gates, respectively, with learnable path signatures. These signatures, which capture the geometric features of the entire path history, provide a richer context for controlling information flow through the network's memory. This modification allows the networks to make memory decisions based on the full historical context rather than just the current input and state. Through experimental studies, we demonstrate that our Signature-LSTM (SigLSTM) and Signature-GRU (SigGRU) models outperform their traditional counterparts across various sequential learning tasks. By leveraging path signatures in recurrent architectures, this method offers new opportunities to enhance performance in time series analysis and forecasting applications.
LGOct 28, 2024
A Temporal Linear Network for Time Series ForecastingRemi Genet, Hugo Inzirillo
Recent research has challenged the necessity of complex deep learning architectures for time series forecasting, demonstrating that simple linear models can often outperform sophisticated approaches. Building upon this insight, we introduce a novel architecture the Temporal Linear Net (TLN), that extends the capabilities of linear models while maintaining interpretability and computational efficiency. TLN is designed to effectively capture both temporal and feature-wise dependencies in multivariate time series data. Our approach is a variant of TSMixer that maintains strict linearity throughout its architecture. TSMixer removes activation functions, introduces specialized kernel initializations, and incorporates dilated convolutions to handle various time scales, while preserving the linear nature of the model. Unlike transformer-based models that may lose temporal information due to their permutation-invariant nature, TLN explicitly preserves and leverages the temporal structure of the input data. A key innovation of TLN is its ability to compute an equivalent linear model, offering a level of interpretability not found in more complex architectures such as TSMixer. This feature allows for seamless conversion between the full TLN model and its linear equivalent, facilitating both training flexibility and inference optimization.
LGJan 14, 2025
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3Rémi Genet, Hugo Inzirillo
In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, \textit{Keras Sig} leverages the seamless integration with the mostly used deep learning backends such as PyTorch, JAX and TensorFlow. Inspired by Kidger and Lyons (2021),we proposed a novel approach reshaping signature calculations to leverage GPU parallelism. This adjustment allows us to reduce the training time by 55\% and 5 to 10-fold improvements in direct signature computation compared to existing methods, while maintaining similar CPU performance. Relying on high-level tensor operations instead of low-level C++ code, Keras Sig significantly reduces the versioning and compatibility issues commonly encountered in deep learning libraries, while delivering superior or comparable performance across various hardware configurations. We demonstrate through extensive benchmarking that our approach scales efficiently with the length of input sequences and maintains competitive performance across various signature parameters, though bounded by memory constraints for very large signature dimensions.
LGSep 21, 2025
LEMs: A Primer On Large Execution ModelsRemi Genet, Hugo Inzirillo
This paper introduces Large Execution Models (LEMs), a novel deep learning framework that extends transformer-based architectures to address complex execution problems with flexible time boundaries and multiple execution constraints. Building upon recent advances in neural VWAP execution strategies, LEMs generalize the approach from fixed-duration orders to scenarios where execution duration is bounded between minimum and maximum time horizons, similar to share buyback contract structures. The proposed architecture decouples market information processing from execution allocation decisions: a common feature extraction pipeline using Temporal Kolmogorov-Arnold Networks (TKANs), Variable Selection Networks (VSNs), and multi-head attention mechanisms processes market data to create informational context, while independent allocation networks handle the specific execution logic for different scenarios (fixed quantity vs. fixed notional, buy vs. sell orders). This architectural separation enables a unified model to handle diverse execution objectives while leveraging shared market understanding across scenarios. Through comprehensive empirical evaluation on intraday cryptocurrency markets and multi-day equity trading using DOW Jones constituents, we demonstrate that LEMs achieve superior execution performance compared to traditional benchmarks by dynamically optimizing execution paths within flexible time constraints. The unified model architecture enables deployment across different execution scenarios (buy/sell orders, varying duration boundaries, volume/notional targets) through a single framework, providing significant operational advantages over asset-specific approaches.
LGOct 31, 2024
CaAdam: Improving Adam optimizer using connection aware methodsRemi Genet, Hugo Inzirillo
We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima. Traditional optimizers, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics. This architecture-agnostic approach is deeply embedded in most deep learning frameworks, where optimizers are implemented as standalone modules without direct access to the network's structural information. For instance, in popular frameworks like Keras or PyTorch, optimizers operate solely on gradients and parameters, without knowledge of layer connectivity or network topology. Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed proxies of architectural information. We propose multiple scaling methodologies that dynamically adjust learning rates based on easily accessible structural properties such as layer depth, connection counts, and gradient distributions. This approach enables more granular optimization while working within the constraints of current deep learning frameworks. Empirical evaluations on standard datasets (e.g., CIFAR-10, Fashion MNIST) show that our method consistently achieves faster convergence and higher accuracy compared to standard Adam optimizer, demonstrating the potential benefits of incorporating architectural awareness in optimization strategies.
PMOct 15, 2024
Clustering Digital Assets Using Path Signatures: Application to Portfolio ConstructionHugo Inzirillo
We propose a new way of building portfolios of cryptocurrencies that provide good diversification properties to investors. First, we seek to filter these digital assets by creating some clusters based on their path signature. The goal is to identify similar patterns in the behavior of these highly volatile assets. Once such clusters have been built, we propose "optimal" portfolios by comparing the performances of such portfolios to a universe of unfiltered digital assets. Our intuition is that clustering based on path signatures will make it easier to capture the main trends and features of a group of cryptocurrencies, and allow parsimonious portfolios that reduce excessive transaction fees. Empirically, our assumptions seem to be satisfied.
LGJan 30, 2025
STAN: Smooth Transition Autoregressive NetworksHugo Inzirillo, Remi Genet
Traditional Smooth Transition Autoregressive (STAR) models offer an effective way to model these dynamics through smooth regime changes based on specific transition variables. In this paper, we propose a novel approach by drawing an analogy between STAR models and a multilayer neural network architecture. Our proposed neural network architecture mimics the STAR framework, employing multiple layers to simulate the smooth transition between regimes and capturing complex, nonlinear relationships. The network's hidden layers and activation functions are structured to replicate the gradual switching behavior typical of STAR models, allowing for a more flexible and scalable approach to regime-dependent modeling. This research suggests that neural networks can provide a powerful alternative to STAR models, with the potential to enhance predictive accuracy in economic and financial forecasting.
LGJun 25, 2024
SigKAN: Signature-Weighted Kolmogorov-Arnold Networks for Time SeriesHugo Inzirillo, Remi Genet
We propose a novel approach that enhances multivariate function approximation using learnable path signatures and Kolmogorov-Arnold networks (KANs). We enhance the learning capabilities of these networks by weighting the values obtained by KANs using learnable path signatures, which capture important geometric features of paths. This combination allows for a more comprehensive and flexible representation of sequential and temporal data. We demonstrate through studies that our SigKANs with learnable path signatures perform better than conventional methods across a range of function approximation challenges. By leveraging path signatures in neural networks, this method offers intriguing opportunities to enhance performance in time series analysis and time series forecasting, among other fields.
LGJun 4, 2024
A Temporal Kolmogorov-Arnold Transformer for Time Series ForecastingRemi Genet, Hugo Inzirillo
Capturing complex temporal patterns and relationships within multivariate data streams is a difficult task. We propose the Temporal Kolmogorov-Arnold Transformer (TKAT), a novel attention-based architecture designed to address this task using Temporal Kolmogorov-Arnold Networks (TKANs). Inspired by the Temporal Fusion Transformer (TFT), TKAT emerges as a powerful encoder-decoder model tailored to handle tasks in which the observed part of the features is more important than the a priori known part. This new architecture combined the theoretical foundation of the Kolmogorov-Arnold representation with the power of transformers. TKAT aims to simplify the complex dependencies inherent in time series, making them more "interpretable". The use of transformer architecture in this framework allows us to capture long-range dependencies through self-attention mechanisms.
LGJan 5, 2022
Deep Fusion of Lead-lag Graphs: Application to CryptocurrenciesHugo Schnoering, Hugo Inzirillo
The study of time series has motivated many researchers, particularly on the area of multivariate-analysis. The study of co-movements and dependency between random variables leads us to develop metrics to describe existing connection between assets. The most commonly used are correlation and causality. Despite the growing literature, some connections remained still undetected. The objective of this paper is to propose a new representation learning algorithm capable to integrate synchronous and asynchronous relationships.
STDec 30, 2021
Dimensionality reduction for prediction: Application to Bitcoin and EthereumHugo Inzirillo, Benjamin Mat
The objective of this paper is to assess the performances of dimensionality reduction techniques to establish a link between cryptocurrencies. We have focused our analysis on the two most traded cryptocurrencies: Bitcoin and Ethereum. To perform our analysis, we took log returns and added some covariates to build our data set. We first introduced the pearson correlation coefficient in order to have a preliminary assessment of the link between Bitcoin and Ethereum. We then reduced the dimension of our data set using canonical correlation analysis and principal component analysis. After performing an analysis of the links between Bitcoin and Ethereum with both statistical techniques, we measured their performance on forecasting Ethereum returns with Bitcoin s features.