Lorenzo Livi

h-index5

42papers

1,466citations

Novelty47%

AI Score48

Ranked #50,526 of 201,326 authors (top 25%)#11,498 in LG (top 27%)

42 Papers

LGJan 4, 2023

Graph state-space models

Daniele Zambon, Andrea Cini, Lorenzo Livi et al.

State-space models constitute an effective modeling tool to describe multivariate time series and operate by maintaining an updated representation of the system state from which predictions are made. Within this framework, relational inductive biases, e.g., associated with functional dependencies existing among signals, are not explicitly exploited leaving unattended great opportunities for effective modeling approaches. The manuscript aims, for the first time, at filling this gap by matching state-space modeling and spatio-temporal data where the relational information, say the functional graph capturing latent dependencies, is learned directly from data and is allowed to change over time. Within a probabilistic formulation that accounts for the uncertainty in the data-generating process, an encoder-decoder architecture is proposed to learn the state-space model end-to-end on a downstream task. The proposed methodological framework generalizes several state-of-the-art methods and demonstrates to be effective in extracting meaningful relational information while achieving optimal forecasting performance in controlled environments.

CHEM-PHMay 6, 2022

Transferring Chemical and Energetic Knowledge Between Molecular Systems with Machine Learning

Sajjad Heydari, Stefano Raniolo, Lorenzo Livi et al.

Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has use cases in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms has impacted on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, possessing a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy states. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing the potential energy of a conformation, and (ii) novel message passing and pooling layers for processing and making predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the very same transfer learning approach can be used to group, in an unsupervised way, various secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.

LGMar 20

Learnability Window in Gated Recurrent Neural Networks

Lorenzo Livi

We develop a statistical theory of temporal learnability in recurrent neural networks, quantifying the maximal temporal horizon $\mathcal{H}_N$ over which gradient-based learning can recover lag-dependent structure at finite sample size $N$. The theory is built on the effective learning rate envelope $f(\ell)$, a functional that captures how gating mechanisms and adaptive optimizers jointly shape the coupling between state-space transport and parameter updates during Backpropagation Through Time. Under heavy-tailed ($Î±$-stable) gradient noise, where empirical averages concentrate at rate $N^{-1/Îº_Î±}$ with $Îº_Î±= Î±/(Î±-1)$, the interplay between envelope decay and statistical concentration yields explicit scaling laws for the growth of $\mathcal{H}_N$: logarithmic, polynomial, and exponential temporal learning regimes emerge according to the decay law of $f(\ell)$. These results identify the envelope decay geometry as the key determinant of temporal learnability: slower attenuation of $f(\ell)$ enlarges the learnability window $\mathcal{H}_N$, while heavy-tailed gradient noise compresses temporal horizons by weakening statistical concentration. Experiments across multiple gated architectures and optimizers corroborate these structural predictions.

CVOct 20, 2014Code

Building pattern recognition applications with the SPARE library

Lorenzo Livi, Guido Del Vescovo, Antonello Rizzi et al.

This paper presents the SPARE C++ library, an open source software tool conceived to build pattern recognition and soft computing systems. The library follows the requirement of the generality: most of the implemented algorithms are able to process user-defined input data types transparently, such as labeled graphs and sequences of objects, as well as standard numeric vectors. Here we present a high-level picture of the SPARE library characteristics, focusing instead on the specific practical possibility of constructing pattern recognition systems for different input data types. In particular, as a proof of concept, we discuss two application instances involving clustering of real-valued multidimensional sequences and classification of labeled graphs.

LGDec 5, 2025

Learnability Window in Gated Recurrent Neural Networks

Lorenzo Livi

We develop a statistical theory of temporal learnability in recurrent neural networks, showing how gating mechanisms determine the learnability window $\mathcal{H}_N$, defined as the maximal temporal horizon over which gradient information remains recoverable at sample size $N$. While classical analyses emphasize numerical stability of Jacobian products, we show that stability alone does not guarantee recoverability. Instead, learnability is governed by the interaction between the decay geometry of the effective learning rate envelope $f(\ell)=\|μ_{t,\ell}\|_1$, derived from first-order expansions of gate-induced Jacobians in Backpropagation Through Time, and the statistical concentration properties of stochastic gradients. Under heavy-tailed ($α$-stable) gradient noise, empirical averages concentrate at rate $N^{-1/κ_α}$ with $κ_α=α/(α-1)$. We prove that this interaction yields explicit scaling laws for the growth of $\mathcal{H}_N$, distinguishing logarithmic, polynomial, and exponential temporal learning regimes according to the attenuation of $f(\ell)$. The theory reveals that gate-induced time-scale spectra are the dominant determinants of temporal learnability: broader spectra slow envelope decay and systematically expand $\mathcal{H}_N$, whereas heavy-tailed noise uniformly compresses temporal horizons by weakening statistical concentration. Empirical results across multiple gated architectures confirm these structural scaling predictions.

LGAug 16, 2025

Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Lorenzo Livi

We study how gating mechanisms in recurrent neural networks (RNNs) implicitly induce adaptive learning-rate behavior, even when training is carried out with a fixed, global learning rate. This effect arises from the coupling between state-space time scales--parametrized by the gates--and parameter-space dynamics during gradient descent. By deriving exact Jacobians for leaky-integrator and gated RNNs, we obtain a first-order expansion that makes explicit how constant, scalar, and multi-dimensional gates reshape gradient propagation, modulate effective step sizes, and introduce anisotropy in parameter updates. These findings reveal that gates not only control information flow, but also act as data-driven preconditioners that adapt optimization trajectories in parameter space. We further draw formal analogies with learning-rate schedules, momentum, and adaptive methods such as Adam. Empirical simulations corroborate these claims: in several sequence tasks, we show that gates induce lag-dependent effective learning rates and directional concentration of gradient flow, with multi-gate models matching or exceeding the anisotropic structure produced by Adam. These results highlight that optimizer-driven and gate-driven adaptivity are complementary but not equivalent mechanisms. Overall, this work provides a unified dynamical systems perspective on how gating couples state evolution with parameter updates, explaining why gated architectures achieve robust trainability and stability in practice.

LGMar 31, 2022

Message Passing Neural Networks for Hypergraphs

Sajjad Heydari, Lorenzo Livi

Hypergraph representations are both more efficient and better suited to describe data characterized by relations between two or more objects. In this work, we present a new graph neural network based on message passing capable of processing hypergraph-structured data. We show that the proposed model defines a design space for neural network models for hypergraphs, thus generalizing existing models for hypergraphs. We report experiments on a benchmark dataset for node classification, highlighting the effectiveness of the proposed model with respect to other state-of-the-art methods for graphs and hypergraphs. We also discuss the benefits of using hypergraph representations and, at the same time, highlight the limitation of using equivalent graph representations when the underlying problem has relations among more than two objects.

LGOct 27, 2021

Learning Graph Cellular Automata

Daniele Grattarola, Lorenzo Livi, Cesare Alippi

Cellular automata (CA) are a class of computational models that exhibit rich dynamics emerging from the local interaction of cells arranged in a regular lattice. In this work we focus on a generalised version of typical CA, called graph cellular automata (GCA), in which the lattice structure is replaced by an arbitrary graph. In particular, we extend previous work that used convolutional neural networks to learn the transition rule of conventional CA and we use graph neural networks to learn a variety of transition rules for GCA. First, we present a general-purpose architecture for learning GCA, and we show that it can represent any arbitrary GCA with finite and discrete state space. Then, we test our approach on three different tasks: 1) learning the transition rule of a GCA on a Voronoi tessellation; 2) imitating the behaviour of a group of flocking agents; 3) learning a rule that converges to a desired target state.

LGOct 6, 2020

Learn to Synchronize, Synchronize to Learn

Pietro Verzelli, Cesare Alippi, Lorenzo Livi

In recent years, the machine learning community has seen a continuous growing interest in research aimed at investigating dynamical aspects of both training procedures and machine learning models. Of particular interest among recurrent neural networks we have the Reservoir Computing (RC) paradigm characterized by conceptual simplicity and a fast training scheme. Yet, the guiding principles under which RC operates are only partially understood. In this work, we analyze the role played by Generalized Synchronization (GS) when training a RC to solve a generic task. In particular, we show how GS allows the reservoir to correctly encode the system generating the input signal into its dynamics. We also discuss necessary and sufficient conditions for the learning to be feasible in this approach. Moreover, we explore the role that ergodicity plays in this process, showing how its presence allows the learning outcome to apply to multiple input trajectories. Finally, we show that satisfaction of the GS can be measured by means of the Mutual False Nearest Neighbors index, which makes effective to practitioners theoretical derivations.

NEMar 24, 2020

Input-to-State Representation in linear reservoirs dynamics

Pietro Verzelli, Cesare Alippi, Lorenzo Livi et al.

Reservoir computing is a popular approach to design recurrent neural networks, due to its training simplicity and approximation performance. The recurrent part of these networks is not trained (e.g., via gradient descent), making them appealing for analytical studies by a large community of researchers with backgrounds spanning from dynamical systems to neuroscience. However, even in the simple linear case, the working principle of these networks is not fully understood and their design is usually driven by heuristics. A novel analysis of the dynamics of such networks is proposed, which allows the investigator to express the state evolution using the controllability matrix. Such a matrix encodes salient characteristics of the network dynamics; in particular, its rank represents an input-indepedent measure of the memory capacity of the network. Using the proposed approach, it is possible to compare different reservoir architectures and explain why a cyclic topology achieves favourable results as verified by practitioners.

LGOct 24, 2019

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi et al.

In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.

LGSep 9, 2019

Graph Random Neural Features for Distance-Preserving Graph Representations

Daniele Zambon, Cesare Alippi, Lorenzo Livi

We present Graph Random Neural Features (GRNF), a novel embedding method from graph-structured data to real vectors based on a family of graph neural networks. The embedding naturally deals with graph isomorphism and preserves the metric structure of the graph domain, in probability. In addition to being an explicit embedding method, it also allows us to efficiently and effectively approximate graph metric distances (as well as complete kernel functions); a criterion to select the embedding dimension trading off the approximation accuracy with the computational cost is also provided. GRNF can be used within traditional processing methods or as a training-free input layer of a graph neural network. The theoretical guarantees that accompany GRNF ensure that the considered graph distance is metric, hence allowing to distinguish any pair of non-isomorphic graphs.

NEMar 27, 2019

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

Pietro Verzelli, Cesare Alippi, Lorenzo Livi

Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behaviour. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of chaos. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory--nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behaviour in phase space characterised by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.

LGMar 18, 2019

Autoregressive Models for Sequences of Graphs

Daniele Zambon, Daniele Grattarola, Lorenzo Livi et al.

This paper proposes an autoregressive (AR) model for sequences of graphs, which generalises traditional AR models. A first novelty consists in formalising the AR model for a very general family of graphs, characterised by a variable topology, and attributes associated with nodes and edges. A graph neural network (GNN) is also proposed to learn the AR function associated with the graph-generating process (GGP), and subsequently predict the next graph in a sequence. The proposed method is compared with four baselines on synthetic GGPs, denoting a significantly better performance on all considered problems.

MLFeb 13, 2019

Deep Divergence-Based Approach to Clustering

Michael Kampffmeyer, Sigurd Løkse, Filippo M. Bianchi et al.

A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps.

LGJan 5, 2019

Graph Neural Networks with convolutional ARMA filters

Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi et al.

Popular graph neural networks implement convolution operations on graphs based on polynomial spectral filters. In this paper, we propose a novel graph convolutional layer inspired by the auto-regressive moving average (ARMA) filter that, compared to polynomial ones, provides a more flexible frequency response, is more robust to noise, and better captures the global graph structure. We propose a graph neural network implementation of the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, localized in the node space, and can be transferred to new graphs at test time. We perform a spectral analysis to study the filtering effect of the proposed ARMA layer and report experiments on four downstream tasks: semi-supervised node classification, graph signal classification, graph classification, and graph regression. Results show that the proposed ARMA layer brings significant improvements over graph neural networks based on polynomial filters.

LGDec 11, 2018

Adversarial Autoencoders with Constant-Curvature Latent Manifolds

Daniele Grattarola, Lorenzo Livi, Cesare Alippi

Constant-curvature Riemannian manifolds (CCMs) have been shown to be ideal embedding spaces in many application domains, as their non-Euclidean geometry can naturally account for some relevant properties of data, like hierarchy and circularity. In this work, we introduce the CCM adversarial autoencoder (CCM-AAE), a probabilistic generative model trained to represent a data distribution on a CCM. Our method works by matching the aggregated posterior of the CCM-AAE with a probability distribution defined on a CCM, so that the encoder implicitly learns to represent data on the CCM to fool the discriminator network. The geometric constraint is also explicitly imposed by jointly training the CCM-AAE to maximise the membership degree of the embeddings to the CCM. While a few works in recent literature make use of either hyperspherical or hyperbolic manifolds for different learning tasks, ours is the first unified framework to seamlessly deal with CCMs of different curvatures. We show the effectiveness of our model on three different datasets characterised by non-trivial geometry: semi-supervised classification on MNIST, link prediction on two popular citation datasets, and graph-based molecule generation using the QM9 chemical database. Results show that our method improves upon other autoencoders based on Euclidean and non-Euclidean geometries on all tasks taken into account.

NEOct 3, 2018

A characterization of the Edge of Criticality in Binary Echo State Networks

Pietro Verzelli, Lorenzo Livi, Cesare Alippi

Echo State Networks (ESNs) are simplified recurrent neural network models composed of a reservoir and a linear, trainable readout layer. The reservoir is tunable by some hyper-parameters that control the network behaviour. ESNs are known to be effective in solving tasks when configured on a region in (hyper-)parameter space called \emph{Edge of Criticality} (EoC), where the system is maximally sensitive to perturbations hence affecting its behaviour. In this paper, we propose binary ESNs, which are architecturally equivalent to standard ESNs but consider binary activation functions and binary recurrent weights. For these networks, we derive a closed-form expression for the EoC in the autonomous case and perform simulations in order to assess their behavior in the case of noisy neurons and in the presence of a signal. We propose a theoretical explanation for the fact that the variance of the input plays a major role in characterizing the EoC.

LGJul 27, 2018

Interpreting recurrent neural networks behaviour via excitable network attractors

Andrea Ceni, Peter Ashwin, Lorenzo Livi

Introduction: Machine learning provides fundamental tools both for scientific research and for the development of technologies with significant impact on society. It provides methods that facilitate the discovery of regularities in data and that give predictions without explicit knowledge of the rules governing a system. However, a price is paid for exploiting such flexibility: machine learning methods are typically black-boxes where it is difficult to fully understand what the machine is doing or how it is operating. This poses constraints on the applicability and explainability of such methods. Methods: Our research aims to open the black-box of recurrent neural networks, an important family of neural networks used for processing sequential data. We propose a novel methodology that provides a mechanistic interpretation of behaviour when solving a computational task. Our methodology uses mathematical constructs called excitable network attractors, which are invariant sets in phase space composed of stable attractors and excitable connections between them. Results and Discussion: As the behaviour of recurrent neural networks depends both on training and on inputs to the system, we introduce an algorithm to extract network attractors directly from the trajectory of a neural network while solving tasks. Simulations conducted on a controlled benchmark task confirm the relevance of these attractors for interpreting the behaviour of recurrent neural networks, at least for tasks that involve learning a finite number of stable states and transitions between them.

MLJul 19, 2018

The Deep Kernelized Autoencoder

Michael Kampffmeyer, Sigurd Løkse, Filippo M. Bianchi et al.

Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder's ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.

MLMay 18, 2018

Change Point Methods on a Sequence of Graphs

Daniele Zambon, Cesare Alippi, Lorenzo Livi

Given a finite sequence of graphs, e.g., coming from technological, biological, and social networks, the paper proposes a methodology to identify possible changes in stationarity in the stochastic process generating the graphs. In order to cover a large class of applications, we consider the general family of attributed graphs where both topology (number of vertexes and edge configuration) and related attributes are allowed to change also in the stationary case. Novel Change Point Methods (CPMs) are proposed, that (i) map graphs into a vector domain; (ii) apply a suitable statistical test in the vector space; (iii) detect the change --if any-- according to a confidence level and provide an estimate for its time occurrence. Two specific multivariate CPMs have been designed: one that detects shifts in the distribution mean, the other addressing generic changes affecting the distribution. We ground our proposal with theoretical results showing how to relate the inference attained in the numerical vector space to the graph domain, and vice versa. We also show how to extend the methodology for handling multiple change points in the same sequence. Finally, the proposed CPMs have been validated on real data sets coming from epileptic-seizure detection problems and on labeled data sets for graph classification. Results show the effectiveness of what proposed in relevant application scenarios.

MLMay 16, 2018

Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds

Daniele Grattarola, Daniele Zambon, Cesare Alippi et al.

The space of graphs is often characterised by a non-trivial geometry, which complicates learning and inference in practical applications. A common approach is to use embedding techniques to represent graphs as points in a conventional Euclidean space, but non-Euclidean spaces have often been shown to be better suited for embedding graphs. Among these, constant-curvature Riemannian manifolds (CCMs) offer embedding spaces suitable for studying the statistical properties of a graph distribution, as they provide ways to easily compute metric geodesic distances. In this paper, we focus on the problem of detecting changes in stationarity in a stream of attributed graphs. To this end, we introduce a novel change detection framework based on neural networks and CCMs, that takes into account the non-Euclidean nature of graphs. Our contribution in this work is twofold. First, via a novel approach based on adversarial learning, we compute graph embeddings by training an autoencoder to represent graphs on CCMs. Second, we introduce two novel change detection tests operating on CCMs. We perform experiments on synthetic data, as well as two real-world application scenarios: the detection of epileptic seizures using functional connectivity brain networks, and the detection of hostility between two subjects, using human skeletal graphs. Results show that the proposed methods are able to detect even small changes in a graph-generating process, consistently outperforming approaches based on Euclidean embeddings.

NEMay 9, 2018

Learning representations for multivariate time series with missing data using Temporal Kernelized Autoencoders

Filippo Maria Bianchi, Lorenzo Livi, Karl Øyvind Mikalsen et al.

Learning compressed representations of multivariate time series (MTS) facilitates data analysis in the presence of noise and redundant information, and for a large number of variates and time steps. However, classical dimensionality reduction approaches are designed for vectorial data and cannot deal explicitly with missing values. In this work, we propose a novel autoencoder architecture based on recurrent neural networks to generate compressed representations of MTS. The proposed model can process inputs characterized by variable lengths and it is specifically designed to handle missing data. Our autoencoder learns fixed-length vectorial representations, whose pairwise similarities are aligned to a kernel function that operates in input space and that handles missing values. This allows to learn good representations, even in the presence of a significant amount of missing data. To show the effectiveness of the proposed approach, we evaluate the quality of the learned representations in several classification tasks, including those involving medical data, and we compare to other methods for dimensionality reduction. Successively, we design two frameworks based on the proposed architecture: one for imputing missing data and another for one-class classification. Finally, we analyze under what circumstances an autoencoder with recurrent layers can learn better compressed representations of MTS than feed-forward architectures.

LGMay 3, 2018

Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

Daniele Zambon, Lorenzo Livi, Cesare Alippi

Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data representation possibly characterised by convenient geometric properties. Euclidean spaces are by far the most widely used embedding spaces, thanks to their well-understood structure and large availability of consolidated inference methods. However, recent research demonstrated that many types of complex data (e.g., those represented as graphs) are actually better described by non-Euclidean geometries. Here, we investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams. The curvature of the space is a parameter that we learn to reproduce the geometry of the original application-dependent graph space. Preliminary experimental results show the potential capability of representing graphs by means of curved manifold, in particular for change and anomaly detection problems.

LGJan 21, 2018

Time series kernel similarities for predicting Paroxysmal Atrial Fibrillation from ECGs

Filippo Maria Bianchi, Lorenzo Livi, Alberto Ferrante et al.

We tackle the problem of classifying Electrocardiography (ECG) signals with the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial fibrillation is the most common type of arrhythmia, but in many cases PAF episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is important to design procedures for detecting and, more importantly, predicting PAF episodes. We propose a method for predicting PAF events whose first step consists of a feature extraction procedure that represents each ECG as a multi-variate time series. Successively, we design a classification framework based on kernel similarities for multi-variate time series, capable of handling missing data. We consider different approaches to perform classification in the original space of the multi-variate time series and in an embedding space, defined by the kernel similarity measure. We achieve a classification accuracy comparable with state of the art methods, with the additional advantage of detecting the PAF onset up to 15 minutes in advance.

LGJun 21, 2017

Concept Drift and Anomaly Detection in Graph Streams

Daniele Zambon, Cesare Alippi, Lorenzo Livi

Graph representations offer powerful and intuitive ways to describe data in a multitude of application domains. Here, we consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology is general and considers a process generating attributed graphs with a variable number of vertices/edges, without the need to assume one-to-one correspondence between vertices at different time steps. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results. In addition, we provide a specific implementation of the methodology and evaluate its effectiveness on several detection problems involving attributed graphs representing biological molecules and drawings. Experimental results are contrasted with respect to suitable baseline methods, demonstrating the effectiveness of our approach.

MLFeb 8, 2017

Deep Kernelized Autoencoders

Michael Kampffmeyer, Sigurd Løkse, Filippo Maria Bianchi et al.

In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During training, we optimize both the reconstruction accuracy of input samples and the alignment between a kernel matrix given as prior and the inner products of the hidden representations computed by the autoencoder. Kernel alignment provides control over the hidden representation learned by the autoencoder. Experiments have been performed to evaluate both reconstruction and kernel alignment performance. Additionally, we applied our method to emulate kPCA on a denoising task obtaining promising results.

NESep 10, 2016

Multiplex visibility graphs to investigate recurrent neural networks dynamics

Filippo Maria Bianchi, Lorenzo Livi, Cesare Alippi et al.

A recurrent neural network (RNN) is a universal approximator of dynamical systems, whose performance often depends on sensitive hyperparameters. Tuning of such hyperparameters may be difficult and, typically, based on a trial-and-error approach. In this work, we adopt a graph-based framework to interpret and characterize the internal RNN dynamics. Through this insight, we are able to design a principled unsupervised method to derive configurations with maximized performances, in terms of prediction error and memory capacity. In particular, we propose to model time series of neurons activations with the recently introduced horizontal visibility graphs, whose topological properties reflect important dynamical features of the underlying dynamic system. Successively, each graph becomes a layer of a larger structure, called multiplex. We show that topological properties of such a multiplex reflect important features of RNN dynamics and are used to guide the tuning procedure. To validate the proposed method, we consider a class of RNNs called echo state networks. We perform experiments and discuss results on several benchmarks and real-world dataset of call data records.

LGApr 8, 2016

One-class classifiers based on entropic spanning graphs

Lorenzo Livi, Cesare Alippi

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the $α$-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.

DATA-ANMar 11, 2016

Determination of the edge of criticality in echo state networks through Fisher information maximization

Lorenzo Livi, Filippo Maria Bianchi, Cesare Alippi

It is a widely accepted fact that the computational capability of recurrent neural networks is maximized on the so-called "edge of criticality". Once the network operates in this configuration, it performs efficiently on a specific application both in terms of (i) low prediction error and (ii) high short-term memory capacity. Since the behavior of recurrent networks is strongly influenced by the particular input signal driving the dynamics, a universal, application-independent method for determining the edge of criticality is still missing. In this paper, we aim at addressing this issue by proposing a theoretically motivated, unsupervised method based on Fisher information for determining the edge of criticality in recurrent neural networks. It is proven that Fisher information is maximized for (finite-size) systems operating in such critical regions. However, Fisher information is notoriously difficult to compute and either requires the probability density function or the conditional dependence of the system states with respect to the model parameters. The paper takes advantage of a recently-developed non-parametric estimator of the Fisher information matrix and provides a method to determine the critical region of echo state networks, a particular class of recurrent networks. The considered control parameters, which indirectly affect the echo state network performance, are explored to identify those configurations lying on the edge of criticality and, as such, maximizing Fisher information and computational performance. Experimental results on benchmarks and real-world data demonstrate the effectiveness of the proposed method.

DATA-ANJan 26, 2016

Investigating echo state networks dynamics by means of recurrence analysis

Filippo Maria Bianchi, Lorenzo Livi, Cesare Alippi

In this paper, we elaborate over the well-known interpretability issue in echo state networks. The idea is to investigate the dynamics of reservoir neurons with time-series analysis techniques taken from research on complex systems. Notably, we analyze time-series of neuron activations with Recurrence Plots (RPs) and Recurrence Quantification Analysis (RQA), which permit to visualize and characterize high-dimensional dynamical systems. We show that this approach is useful in a number of ways. First, the two-dimensional representation offered by RPs provides a way for visualizing the high-dimensional dynamics of a reservoir. Our results suggest that, if the network is stable, reservoir and input denote similar line patterns in the respective RPs. Conversely, the more unstable the ESN, the more the RP of the reservoir presents instability patterns. As a second result, we show that the $\mathrm{L_{max}}$ measure is highly correlated with the well-established maximal local Lyapunov exponent. This suggests that complexity measures based on RP diagonal lines distribution provide a valuable tool to quantify the degree of network stability. Finally, our analysis shows that all RQA measures fluctuate on the proximity of the so-called edge of stability, where an ESN typically achieves maximum computational capability. We verify that the determination of the edge of stability provided by such RQA measures is more accurate than two well-known criteria based on the Jacobian matrix of the reservoir. Therefore, we claim that RPs and RQA-based analyses can be used as valuable tools to design an effective network given a specific problem.

DATA-ANOct 24, 2015

Data-driven detrending of nonstationary fractal time series with echo state networks

Enrico Maiorino, Filippo Maria Bianchi, Lorenzo Livi et al.

In this paper, we propose a novel data-driven approach for removing trends (detrending) from nonstationary, fractal and multifractal time series. We consider real-valued time series relative to measurements of an underlying dynamical system that evolves through time. We assume that such a dynamical process is predictable to a certain degree by means of a class of recurrent networks called Echo State Network (ESN), which are capable to model a generic dynamical process. In order to isolate the superimposed (multi)fractal component of interest, we define a data-driven filter by leveraging on the ESN prediction capability to identify the trend component of a given input time series. Specifically, the (estimated) trend is removed from the original time series and the residual signal is analyzed with the multifractal detrended fluctuation analysis procedure to verify the correctness of the detrending procedure. In order to demonstrate the effectiveness of the proposed technique, we consider several synthetic time series consisting of different types of trends and fractal noise components with known characteristics. We also process a real-world dataset, the sunspot time series, which is well-known for its multifractal features and has recently gained attention in the complex systems field. Results demonstrate the validity and generality of the proposed detrending method based on ESNs.

MED-PHApr 10, 2015

Discrimination and characterization of Parkinsonian rest tremors by analyzing long-term correlations and multifractal signatures

Lorenzo Livi, Alireza Sadeghian, Hamid Sadeghian

In this paper, we analyze 48 signals of rest tremor velocity related to 12 distinct subjects affected by Parkinson's disease. The subjects belong to two different groups, formed by four and eight subjects with, respectively, high- and low-amplitude rest tremors. Each subject is tested in four settings, given by combining the use of deep brain stimulation and L-DOPA medication. We develop two main feature-based representations of such signals, which are obtained by considering (i) the long-term correlations and multifractal properties, and (ii) the power spectra. The feature-based representations are initially utilized for the purpose of characterizing the subjects under different settings. In agreement with previous studies, we show that deep brain stimulation does not significantly characterize neither of the two groups, regardless of the adopted representation. On the other hand, the medication effect yields statistically significant differences in both high- and low-amplitude tremor groups. We successively test several different instances of the two feature-based representations of the signals in the setting of supervised classification and (nonlinear) feature transformation. We consider three different classification problems, involving the recognition of (i) the presence of medication, (ii) the use of deep brain stimulation, and (iii) the membership to the high- and low-amplitude tremor groups. Classification results show that the use of medication can be discriminated with higher accuracy, considering many of the feature-based representations. Notably, we show that the best results are obtained with a parsimonious, two-dimensional representation encoding the long-term correlations and multifractal character of the signals.

CEJan 19, 2015

On the impact of topological properties of smart grids in power losses optimization problems

Francesca Possemato, Maurizio Paschero, Lorenzo Livi et al.

Power losses reduction is one of the main targets for any electrical energy distribution company. In this paper, we face the problem of joint optimization of both topology and network parameters in a real smart grid. We consider a portion of the Italian electric distribution network managed by the ACEA Distribuzione S.p.A. located in Rome. We perform both the power factor correction (PFC) for tuning the generators and the distributed feeder reconfiguration (DFR) to set the state of the breakers. This joint optimization problem is faced considering a suitable objective function and by adopting genetic algorithms as global optimization strategy. We analyze admissible network configurations, showing that some of these violate constraints on current and voltage at branches and nodes. Such violations depend only on pure topological properties of the configurations. We perform tests by feeding the simulation environment with real data concerning hourly samples of dissipated and generated active and reactive power values of the ACEA smart grid. Results show that removing the configurations violating the electrical constraints from the solution space leads to interesting improvements in terms of power loss reduction. To conclude, we provide also an electrical interpretation of the phenomenon using graph-based pattern analysis techniques.

LGSep 17, 2014

An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery

Filippo Maria Bianchi, Enrico Maiorino, Lorenzo Livi et al.

We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of configurations of the adopted parametric dissimilarity measure yielding compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a suitable weighted graph representation of the input dataset. Such a weighted graph representation is induced by the specific parameter configuration of the dissimilarity measure adopted by the agent, which searches and takes decisions autonomously for one cluster at a time. Results show that the algorithm is able to discover parameter configurations that yield a consistent and interpretable collection of clusters. Moreover, we demonstrate that our algorithm shows comparable performances with other similar state-of-the-art algorithms when facing specific clustering problems.

CVAug 22, 2014

Designing labeled graph classifiers by exploiting the Rényi entropy of the dissimilarity representation

Lorenzo Livi

Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.

CVAug 17, 2014

Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

Lorenzo Livi, Antonello Rizzi, Alireza Sadeghian

We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.

DATA-ANJul 30, 2014

Characterization of graphs for protein structure modeling and recognition of solubility

Lorenzo Livi, Alessandro Giuliani, Alireza Sadeghian

This paper deals with the relations among structural, topological, and chemical properties of the E.Coli proteome from the vantage point of the solubility/aggregation propensity of proteins. Each E.Coli protein is initially represented according to its known folded 3D shape. This step consists in representing the available E.Coli proteins in terms of graphs. We first analyze those graphs by considering pure topological characterizations, i.e., by analyzing the mass fractal dimension and the distribution underlying both shortest paths and vertex degrees. Results confirm the general architectural principles of proteins. Successively, we focus on the statistical properties of a representation of such graphs in terms of vectors composed of several numerical features, which we extracted from their structural representation. We found that protein size is the main discriminator for the solubility, while however there are other factors that help explaining the solubility degree. We finally analyze such data through a novel one-class classifier, with the aim of discriminating among very and poorly soluble proteins. Results are encouraging and consolidate the potential of pattern recognition techniques when employed to describe complex biological systems.

CEJul 28, 2014

Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem

Lorenzo Livi, Alessandro Giuliani, Antonello Rizzi

This paper builds upon the fundamental work of Niwa et al. [34], which provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations [10]. Here we apply several state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins gathered from the Niwa et al. data base; such representations include sequences and labeled (contact) graphs enriched with chemico-physical attributes. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating "easily foldable" from "hardly foldable" molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.

CVJul 28, 2014

Entropic one-class classifiers

Lorenzo Livi, Alireza Sadeghian, Witold Pedrycz

The one-class classification problem is a well-known research endeavor in pattern recognition. The problem is also known under different names, such as outlier and novelty/anomaly detection. The core of the problem consists in modeling and recognizing patterns belonging only to a so-called target class. All other patterns are termed non-target, and therefore they should be recognized as such. In this paper, we propose a novel one-class classification system that is based on an interplay of different techniques. Primarily, we follow a dissimilarity representation based approach; we embed the input data into the dissimilarity space by means of an appropriate parametric dissimilarity measure. This step allows us to process virtually any type of data. The dissimilarity vectors are then represented through a weighted Euclidean graphs, which we use to (i) determine the entropy of the data distribution in the dissimilarity space, and at the same time (ii) derive effective decision regions that are modeled as clusters of vertices. Since the dissimilarity measure for the input data is parametric, we optimize its parameters by means of a global optimization scheme, which considers both mesoscopic and structural characteristics of the data represented through the graphs. The proposed one-class classifier is designed to provide both hard (Boolean) and soft decisions about the recognition of test patterns, allowing an accurate description of the classification process. We evaluate the performance of the system on different benchmarking datasets, containing either feature-based or structured patterns. Experimental results demonstrate the effectiveness of the proposed technique.

AIJul 26, 2014

Data granulation by the principles of uncertainty

Lorenzo Livi, Alireza Sadeghian

Researches in granular modeling produced a variety of mathematical models, such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets, which are all suitable to characterize the so-called information granules. Modeling of the input data uncertainty is recognized as a crucial aspect in information granulation. Moreover, the uncertainty is a well-studied concept in many mathematical settings, such as those of probability theory, fuzzy set theory, and possibility theory. This fact suggests that an appropriate quantification of the uncertainty expressed by the information granule model could be used to define an invariant property, to be exploited in practical situations of information granulation. In this perspective, a procedure of information granulation is effective if the uncertainty conveyed by the synthesized information granule is in a monotonically increasing relation with the uncertainty of the input data. In this paper, we present a data granulation framework that elaborates over the principles of uncertainty introduced by Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is possible to apply such principles regardless of the input data type and the specific mathematical setting adopted for the information granules. The proposed framework is conceived (i) to offer a guideline for the synthesis of information granules and (ii) to build a groundwork to compare and quantitatively judge over different data granulation procedures. To provide a suitable case study, we introduce a new data granulation technique based on the minimum sum of distances, which is designed to generate type-2 fuzzy sets. We analyze the procedure by performing different experiments on two distinct data types: feature vectors and labeled graphs. Results show that the uncertainty of the input data is suitably conveyed by the generated type-2 fuzzy set models.

AIJul 25, 2014

Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification

Enrico De Santis, Lorenzo Livi, Alireza Sadeghian et al.

Detecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule.