PMFeb 3
A Novel approach to portfolio constructionT. Di Matteo, L. Riso, M. G. Zoia
This paper proposes a machine learning-based framework for asset selection and portfolio construction, termed the Best-Path Algorithm Sparse Graphical Model (BPASGM). The method extends the Best-Path Algorithm (BPA) by mapping linear and non-linear dependencies among a large set of financial assets into a sparse graphical model satisfying a structural Markov property. Based on this representation, BPASGM performs a dependence-driven screening that removes positively or redundantly connected assets, isolating subsets that are conditionally independent or negatively correlated. This step is designed to enhance diversification and reduce estimation error in high-dimensional portfolio settings. Portfolio optimization is then conducted on the selected subset using standard mean-variance techniques. BPASGM does not aim to improve the theoretical mean-variance optimum under known population parameters, but rather to enhance realized performance in finite samples, where sample-based Markowitz portfolios are highly sensitive to estimation error. Monte Carlo simulations show that BPASGM-based portfolios achieve more stable risk-return profiles, lower realized volatility, and superior risk-adjusted performance compared to standard mean-variance portfolios. Empirical results for U.S. equities, global stock indices, and foreign exchange rates over 1990-2025 confirm these findings and demonstrate a substantial reduction in portfolio cardinality. Overall, BPASGM offers a statistically grounded and computationally efficient framework that integrates sparse graphical modeling with portfolio theory for dependence-aware asset selection.
RMApr 11, 2020
A new multilayer network construction via Tensor learningGiuseppe Brandi, T. Di Matteo
Multilayer networks proved to be suitable in extracting and providing dependency information of different complex systems. The construction of these networks is difficult and is mostly done with a static approach, neglecting time delayed interdependences. Tensors are objects that naturally represent multilayer networks and in this paper, we propose a new methodology based on Tucker tensor autoregression in order to build a multilayer network directly from data. This methodology captures within and between connections across layers and makes use of a filtering procedure to extract relevant information and improve visualization. We show the application of this methodology to different stationary fractionally differenced financial data. We argue that our result is useful to understand the dependencies across three different aspects of financial risk, namely market risk, liquidity risk, and volatility risk. Indeed, we show how the resulting visualization is a useful tool for risk managers depicting dependency asymmetries between different risk factors and accounting for delayed cross dependencies. The constructed multilayer network shows a strong interconnection between the volumes and prices layers across all the stocks considered while a lower number of interconnections between the uncertainty measures is identified.
MLFeb 11, 2020
Predicting Multidimensional Data via Tensor LearningGiuseppe Brandi, T. Di Matteo
The analysis of multidimensional data is becoming a more and more relevant topic in statistical and machine learning research. Given their complexity, such data objects are usually reshaped into matrices or vectors and then analysed. However, this methodology presents several drawbacks. First of all, it destroys the intrinsic interconnections among datapoints in the multidimensional space and, secondly, the number of parameters to be estimated in a model increases exponentially. We develop a model that overcomes such drawbacks. In particular, in this paper, we propose a parsimonious tensor regression model that retains the intrinsic multidimensional structure of the dataset. Tucker structure is employed to achieve parsimony and a shrinkage penalization is introduced to deal with over-fitting and collinearity. To estimate the model parameters, an Alternating Least Squares algorithm is developed. In order to validate the model performance and robustness, a simulation exercise is produced. Moreover, we perform an empirical analysis that highlight the forecasting power of the model with respect to benchmark models. This is achieved by implementing an autoregressive specification on the Foursquares spatio-temporal dataset together with a macroeconomic panel dataset. Overall, the proposed model is able to outperform benchmark models present in the forecasting literature.
CYApr 5, 2017
Blockchain Inefficiency in the Bitcoin Peers NetworkGiuseppe Pappalardo, T. Di Matteo, Guido Caldarelli et al.
We investigate Bitcoin network monitoring the dynamics of blocks and transactions. We unveil that 43\% of the transactions are still not included in the Blockchain after 1h from the first time they were seen in the network and 20\% of the transactions are still not included in the Blockchain after 30 days, revealing therefore great inefficiency in the Bitcoin system. However, we observe that most of these `forgotten' transactions have low values and in terms of transferred value the system is less inefficient with 93\% of the transactions value being included into the Blockchain within 3h. The fact that a sizeable fraction of transactions is not processed timely casts serious doubts on the usability of the Bitcoin Blockchain for reliable time-stamping purposes and calls for a debate about the right systems of incentives which a peer-to-peer unintermediated system should introduce to promote efficient transaction recording.
ITFeb 23, 2016
Parsimonious modeling with Information Filtering NetworksWolfram Barfuss, Guido Previde Massara, T. Di Matteo et al.
We introduce a methodology to construct parsimonious probabilistic models. This method makes use of Information Filtering Networks to produce a robust estimate of the global sparse inverse covariance from a simple sum of local inverse covariances computed on small sub-parts of the network. Being based on local and low-dimensional inversions, this method is computationally very efficient and statistically robust even for the estimation of inverse covariance of high-dimensional, noisy and short time-series. Applied to financial data our method results computationally more efficient than state-of-the-art methodologies such as Glasso producing, in a fraction of the computation time, models that can have equivalent or better performances but with a sparser inference structure. We also discuss performances with sparse factor models where we notice that relative performances decrease with the number of factors. The local nature of this approach allows us to perform computations in parallel and provides a tool for dynamical adaptation by partial updating when the properties of some variables change without the need of recomputing the whole model. This makes this approach particularly suitable to handle big datasets with large numbers of variables. Examples of practical application for forecasting, stress testing and risk allocation in financial systems are also provided.
DSMay 10, 2015
Network Filtering for Big Data: Triangulated Maximally Filtered GraphGuido Previde Massara, T. Di Matteo, Tomaso Aste
We propose a network-filtering method, the Triangulated Maximally Filtered Graph (TMFG), that provides an approximate solution to the Weighted Maximal Planar Graph problem. The underlying idea of TMFG consists in building a triangulation that maximizes a score function associated with the amount of information retained by the network. TMFG uses as weights any arbitrary similarity measure to arrange data into a meaningful network structure that can be used for clustering, community detection and modeling. The method is fast, adaptable and scalable to very large datasets, it allows online updating and learning as new data can be inserted and deleted with combinations of local and non-local moves. TMFG permits readjustments of the network in consequence of changes in the strength of the similarity measure. The method is based on local topological moves and can therefore take advantage of parallel and GPUs computing. We discuss how this network-filtering method can be used intuitively and efficiently for big data studies and its significance from an information-theoretic perspective.