Alessandro Betti

LG
h-index16
37papers
511citations
Novelty50%
AI Score33

37 Papers

CVJun 30, 2022
Deep Learning to See: Towards New Foundations of Computer Vision

Alessandro Betti, Marco Gori, Stefano Melacci

The remarkable progress in computer vision over the last few years is, by and large, attributed to deep learning, fueled by the availability of huge sets of labeled data, and paired with the explosive growth of the GPU paradigm. While subscribing to this view, this book criticizes the supposed scientific progress in the field and proposes the investigation of vision within the framework of information-based laws of nature. Specifically, the present work poses fundamental questions about vision that remain far from understood, leading the reader on a journey populated by novel challenges resonating with the foundations of machine learning. The central thesis is that for a deeper understanding of visual computational processes, it is necessary to look beyond the applications of general purpose machine learning algorithms and focus instead on appropriate learning theories that take into account the spatiotemporal nature of the visual signal.

LGOct 17, 2022
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Enrico Meloni, Lapo Faggi, Simone Marullo et al.

In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.

CVApr 26, 2022
Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams

Matteo Tiezzi, Simone Marullo, Lapo Faggi et al.

Devising intelligent agents able to live in an environment and learn by observing the surroundings is a longstanding goal of Artificial Intelligence. From a bare Machine Learning perspective, challenges arise when the agent is prevented from leveraging large fully-annotated dataset, but rather the interactions with supervisory signals are sparsely distributed over space and time. This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream. The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations. Spatio-temporal stochastic coherence along the attention trajectory, paired with a contrastive term, leads to an unsupervised learning criterion that naturally copes with the considered setting. Differently from most existing works, the learned representations are used in open-set class-incremental classification of each frame pixel, relying on few supervisions. Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream. Inheriting features from state-of-the art models is not as powerful as one might expect.

LGSep 10, 2024
Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Jinwei Zhao, Marco Gori, Alessandro Betti et al.

Gradient descent (GD) and stochastic gradient descent (SGD) have been widely used in a large number of application domains. Therefore, understanding the dynamics of GD and improving its convergence speed is still of great importance. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow. On the basis of the terminal sliding mode theory and the terminal attractor theory, four adaptive learning rates are designed. Their performances are investigated in light of a detailed theoretical investigation, and the running times of the learning procedures are evaluated and compared. The total times of their learning processes are also studied in detail. To evaluate their effectiveness, various simulation results are investigated on a function approximation problem and an image classification problem.

CVApr 5, 2022
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery

Alessandro Betti

Despite the breakthrough deep learning performances achieved for automatic object detection, small target detection is still a challenging problem, especially when looking at fast and accurate solutions suitable for mobile or edge applications. In this work we present YOLO-S, a simple, fast and efficient network for small target detection. The architecture exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation, and reshape-passthrough layer to alleviate the vanishing gradient problem, promote feature reuse across network and combine low-level positional information with more meaningful high-level information. To verify the performances of YOLO-S, we build "AIRES", a novel dataset for cAr detectIon fRom hElicopter imageS acquired in Europe, and set up experiments on both AIRES and VEDAI datasets, benchmarking this architecture with four baseline detectors. Furthermore, in order to handle efficiently the issue of data insufficiency and domain gap when dealing with a transfer learning strategy, we introduce a transitional learning task over a combined dataset based on DOTAv2 and VEDAI and demonstrate that can enhance the overall accuracy with respect to more general features transferred from COCO data. YOLO-S is from 25% to 50% faster than YOLOv3 and only 15-25% slower than Tiny-YOLOv3, outperforming also YOLOv3 in terms of accuracy in a wide range of experiments. Further simulations performed on SARD dataset demonstrate also its applicability to different scenarios such as for search and rescue operations. Besides, YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.

LGSep 18, 2024
A Unified Framework for Neural Computation and Learning Over Time

Stefano Melacci, Alessandro Betti, Michele Casoni et al.

This paper proposes Hamiltonian Learning, a novel unified framework for learning with neural networks "over time", i.e., from a possibly infinite stream of data, in an online manner, without having access to future information. Existing works focus on the simplified setting in which the stream has a known finite length or is segmented into smaller sequences, leveraging well-established learning strategies from statistical machine learning. In this paper, the problem of learning over time is rethought from scratch, leveraging tools from optimal control theory, which yield a unifying view of the temporal dynamics of neural computations and learning. Hamiltonian Learning is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives. The proposed framework is showcased by experimentally proving how it can recover gradient-based learning, comparing it to out-of-the box optimizers, and describing how it is flexible enough to switch from fully-local to partially/non-local computational schemes, possibly distributed over multiple devices, and BackPropagation without storing activations. Hamiltonian Learning is easy to implement and can help researches approach in a principled and innovative manner the problem of learning over time.

LGFeb 12, 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

Matteo Tiezzi, Michele Casoni, Alessandro Betti et al.

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large Language Models) promotes the idea of parallel attention as the key to succeed in such a challenge, obfuscating the role of classic sequential processing of Recurrent Models. However, in the last few years, researchers who were concerned by the quadratic complexity of self-attention have been proposing a novel wave of neural models, which gets the best from the two worlds, i.e., Transformers and Recurrent Nets. Meanwhile, Deep Space-State Models emerged as robust approaches to function approximation over time, thus opening a new perspective in learning from sequential data, followed by many people in the field and exploited to implement a special class of (linear) Recurrent Neural Networks. This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. Moreover, it emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences whose length is known-in-advance for the more realistic setting of potentially infinite-length sequences, thus intersecting the field of lifelong-online learning from streamed data.

OCDec 14, 2023
Neural Time-Reversed Generalized Riccati Equation

Alessandro Betti, Michele Casoni, Marco Gori et al.

Optimal control deals with optimization problems in which variables steer a dynamical system, and its outcome contributes to the objective function. Two classical approaches to solving these problems are Dynamic Programming and the Pontryagin Maximum Principle. In both approaches, Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates. However, Hamiltonian equations are rarely used due to their reliance on forward-backward algorithms across the entire temporal domain. This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time. Neural networks are employed not only for implementing state dynamics but also for estimating costate variables. The parameters of the latter network are determined at each time step using a newly introduced local policy referred to as the time-reversed generalized Riccati equation. This policy is inspired by a result discussed in the Linear Quadratic (LQ) problem, which we conjecture stabilizes state dynamics. We support this conjecture by discussing experimental results from a range of optimal control case studies.

LGApr 16, 2025
Generative System Dynamics in Recurrent Neural Networks

Michele Casoni, Tommaso Guidi, Alessandro Betti et al.

In this study, we investigate the continuous time dynamics of Recurrent Neural Networks (RNNs), focusing on systems with nonlinear activation functions. The objective of this work is to identify conditions under which RNNs exhibit perpetual oscillatory behavior, without converging to static fixed points. We establish that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations. We further demonstrate that hyperbolic tangent-like activation functions (odd, bounded, and continuous) preserve these oscillatory dynamics by ensuring motion invariants in state space. Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process, mitigating those instabilities that are commonly associated with the forward Euler method. The experimental results of this analysis highlight practical considerations for designing neural architectures capable of capturing complex temporal dependencies, i.e., strategies for enhancing memorization skills in recurrent models.

LGJun 13, 2024
State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

Matteo Tiezzi, Michele Casoni, Alessandro Betti et al.

Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. From the dawn of Machine Learning, several researchers have pursued algorithms and architectures capable of processing sequences of patterns, retaining information about past inputs while still leveraging future data, without losing precious long-term dependencies and correlations. While such an ultimate goal is inspired by the human hallmark of continuous real-time processing of sensory information, several solutions have simplified the learning paradigm by artificially limiting the processed context or dealing with sequences of limited length, given in advance. These solutions were further emphasized by the ubiquity of Transformers, which initially overshadowed the role of Recurrent Neural Nets. However, recurrent networks are currently experiencing a strong recent revival due to the growing popularity of (deep) State-Space models and novel instances of large-context Transformers, which are both based on recurrent computations that aim to go beyond several limits of currently ubiquitous technologies. The fast development of Large Language Models has renewed the interest in efficient solutions to process data over time. This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. A complete taxonomy of recent trends in architectural and algorithmic solutions is reported and discussed, guiding researchers in this appealing research field. The emerging picture suggests that there is room for exploring novel routes, constituted by learning algorithms that depart from the standard Backpropagation Through Time, towards a more realistic scenario where patterns are effectively processed online, leveraging local-forward computations, and opening new directions for research on this topic.

LGFeb 4, 2024
Nature-Inspired Local Propagation

Alessandro Betti, Marco Gori

The spectacular results achieved in machine learning, including the recent advances in generative AI, rely on large data collections. On the opposite, intelligent processes in nature arises without the need for such collections, but simply by online processing of the environmental information. In particular, natural learning processes rely on mechanisms where data representation and learning are intertwined in such a way to respect spatiotemporal locality. This paper shows that such a feature arises from a pre-algorithmic view of learning that is inspired by related studies in Theoretical Physics. We show that the algorithmic interpretation of the derived "laws of learning", which takes the structure of Hamiltonian equations, reduces to Backpropagation when the speed of propagation goes to infinity. This opens the doors to machine learning studies based on full on-line information processing that are based the replacement of Backpropagation with the proposed spatiotemporal local algorithm.

CVNov 23, 2021
A Multi-Stage model based on YOLOv3 for defect detection in PV panels based on IR and Visible Imaging by Unmanned Aerial Vehicle

Antonio Di Tommaso, Alessandro Betti, Giacomo Fontanelli et al.

As solar capacity installed worldwide continues to grow, there is an increasing awareness that advanced inspection systems are becoming of utmost importance to schedule smart interventions and minimize downtime likelihood. In this work we propose a novel automatic multi-stage model to detect panel defects on aerial images captured by unmanned aerial vehicle by using the YOLOv3 network and Computer Vision techniques. The model combines detections of panels and defects to refine its accuracy and exhibits an average inference time per image of 0.98 s. The main novelties are represented by its versatility to process either thermographic or visible images and detect a large variety of defects, to prescript recommended actions to O&M crew to give a more efficient data-driven maintenance strategy and its portability to both rooftop and ground-mounted PV systems and different panel types. The proposed model has been validated on two big PV plants in the south of Italy with an outstanding AP@0.5 exceeding 98% for panel detection, a remarkable AP@0.4 (AP@0.5) of roughly 88.3% (66.9%) for hotspots by means of infrared thermography and a mAP@0.5 of almost 70% in the visible spectrum for detection of anomalies including panel shading induced by soiling and bird dropping, delamination, presence of puddles and raised rooftop panels. The model predicts also the severity of hotspot areas based on the estimated temperature gradients, as well as it computes the soiling coverage based on visual images. Finally an analysis of the influence of the different YOLOv3's output scales on the detection is discussed.

LGOct 15, 2021
Knowledge-driven Active Learning

Gabriele Ciravegna, Frédéric Precioso, Alessandro Betti et al.

The deployment of Deep Learning (DL) models is still precluded in those contexts where the amount of supervised data is limited. To answer this issue, active learning strategies aim at minimizing the amount of labelled data required to train a DL model. Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. These techniques are theoretically sound, but an understanding of the selected samples based on their content is not straightforward, further driving non-experts to consider DL as a black-box. For the first time, here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples. In our Knowledge-driven Active Learning (KAL) framework, rule-based knowledge is converted into logic constraints and their violation is checked as a natural guide for sample selection. We show that even simple relationships among data and output classes offer a way to spot predictions for which the model need supervision. We empirically show that KAL (i) outperforms many active learning strategies, particularly in those contexts where domain knowledge is rich, (ii) it discovers data distribution lying far from the initial training data, (iii) it ensures domain experts that the provided knowledge is acquired by the model, (iv) it is suitable for regression and object recognition tasks unlike uncertainty-based strategies, and (v) its computational demand is low.

CVOct 12, 2021
Can machines learn to see without visual databases?

Alessandro Betti, Marco Gori, Stefano Melacci et al.

This paper sustains the position that the time has come for thinking of learning machines that conquer visual skills in a truly human-like context, where a few human-like object supervisions are given by vocal interactions and pointing aids only. This likely requires new foundations on computational processes of vision with the final purpose of involving machines in tasks of visual description by living in their own visual environment under simple man-machine linguistic interactions. The challenge consists of developing machines that learn to see without needing to handle visual databases. This might open the doors to a truly orthogonal competitive track concerning deep learning technologies for vision which does not rely on the accumulation of huge visual databases.

CVSep 16, 2021
Evaluating Continual Learning Algorithms by Generating 3D Virtual Environments

Enrico Meloni, Alessandro Betti, Lapo Faggi et al.

Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment. Trying to simulate this learning process in machines is a challenging task, also due to the inherent difficulty in creating conditions for designing continuously evolving dynamics that are typical of the real-world. Many existing research works usually involve training and testing of virtual agents on datasets of static images or short videos, considering sequences of distinct learning tasks. However, in order to devise continual learning algorithms that operate in more realistic conditions, it is fundamental to gain access to rich, fully customizable and controlled experimental playgrounds. Focussing on the specific case of vision, we thus propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance. Scenes are composed of objects that move along variable routes with different and fully customizable timings, and randomness can also be included in their evolution. A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives. These general principles are concretely implemented exploiting a recently published 3D virtual environment. The user can generate scenes without the need of having strong skills in computer graphics, since all the generation facilities are exposed through a simple high-level Python interface. We publicly share the proposed generator.

LGOct 28, 2020
An Optimal Control Approach to Learning in SIDARTHE Epidemic model

Andrea Zugarini, Enrico Meloni, Alessandro Betti et al.

The COVID-19 outbreak has stimulated the interest in the proposal of novel epidemiological models to predict the course of the epidemic so as to help planning effective control strategies. In particular, in order to properly interpret the available data, it has become clear that one must go beyond most classic epidemiological models and consider models that, like the recently proposed SIDARTHE, offer a richer description of the stages of infection. The problem of learning the parameters of these models is of crucial importance especially when assuming that they are time-variant, which further enriches their effectiveness. In this paper we propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data. We formulate the problem in terms of a functional risk that depends on the learning variables through the solutions of a dynamic system. The resulting variational problem is then solved by using a gradient flow on a suitable, regularized functional. We forecast the epidemic evolution in Italy and France. Results indicate that the model provides reliable and challenging predictions over all available data as well as the fundamental role of the chosen strategy on the time-variant parameters.

LGSep 1, 2020
Developing Constrained Neural Units Over Time

Alessandro Betti, Marco Gori, Simone Marullo et al.

In this paper we present a foundational study on a constrained method that defines learning problems with Neural Networks in the context of the principle of least cognitive action, which very much resembles the principle of least action in mechanics. Starting from a general approach to enforce constraints into the dynamical laws of learning, this work focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. In particular, the structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data, leading to "architectural" and "input-related" constraints, respectively. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner, that makes this study an important step toward alternative ways of processing continuous streams of data with Neural Networks. The connection with the classic Backpropagation-based update rule of the weights of networks is discussed, showing that there are conditions under which our approach degenerates to Backpropagation. Moreover, the theory is experimentally evaluated on a simple problem that allows us to deeply study several aspects of the theory itself and to show the soundness of the model.

CVJun 19, 2020
Wave Propagation of Visual Stimuli in Focus of Attention

Lapo Faggi, Alessandro Betti, Dario Zanca et al.

Fast reactions to changes in the surrounding visual environment require efficient attention mechanisms to reallocate computational resources to most relevant locations in the visual field. While current computational models keep improving their predictive ability thanks to the increasing availability of data, they still struggle approximating the effectiveness and efficiency exhibited by foveated animals. In this paper, we present a biologically-plausible computational model of focus of attention that exhibits spatiotemporal locality and that is very well-suited for parallel and distributed implementations. Attention emerges as a wave propagation process originated by visual stimuli corresponding to details and motion information. The resulting field obeys the principle of "inhibition of return" so as not to get stuck in potential holes. An accurate experimentation of the model shows that it achieves top level performance in scanpath prediction tasks. This can easily be understood at the light of a theoretical result that we establish in the paper, where we prove that as the velocity of wave propagation goes to infinity, the proposed model reduces to recently proposed state of the art gravitational models of focus of attention.

LGJun 16, 2020
Focus of Attention Improves Information Transfer in Visual Features

Matteo Tiezzi, Stefano Melacci, Alessandro Betti et al.

Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly to an appropriate spatio-temporal distribution of the visual data, while most approaches of learning commonly assume uniform probability density. In this paper we focus on unsupervised learning for transferring visual information in a truly online setting by using a computational model that is inspired to the principle of least action in physics. The maximization of the mutual information is carried out by a temporal process which yields online estimation of the entropy terms. The model, which is based on second-order differential equations, maximizes the information transfer from the input to a discrete space of symbols related to the visual features of the input, whose computation is supported by hidden neurons. In order to better structure the input probability distribution, we use a human-like focus of attention model that, coherently with the information maximization model, is also based on second-order differential equations. We provide experimental results to support the theory by showing that the spatio-temporal filtering induced by the focus of attention allows the system to globally transfer more information from the input stream over the focused areas and, in some contexts, over the whole frames with respect to the unfiltered case that yields uniform probability distributions.

LGFeb 18, 2020
Local Propagation in Constraint-based Neural Network

Giuseppe Marra, Matteo Tiezzi, Stefano Melacci et al.

In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on the search for saddle points in the adjoint space composed of weights, neural outputs, and Lagrange multipliers. All the updates of the model variables are locally performed, so that LP is fully parallelizable over the neural units, circumventing the classic problem of gradient vanishing in deep networks. The implementation of popular neural models is described in the context of LP, together with those conditions that trace a natural connection with Backpropagation. We also investigate the setting in which we tolerate bounded violations of the architectural constraints, and we provide experimental evidence that LP is a feasible approach to train shallow and deep networks, opening the road to further investigations on more complex architectures, easily describable by constraints.

CVFeb 10, 2020
Real-Time target detection in maritime scenarios based on YOLOv3 model

Alessandro Betti, Benedetto Michelozzi, Andrea Bracci et al.

In this work a novel ships dataset is proposed consisting of more than 56k images of marine vessels collected by means of web-scraping and including 12 ship categories. A YOLOv3 single-stage detector based on Keras API is built on top of this dataset. Current results on four categories (cargo ship, naval ship, oil ship and tug ship) show Average Precision up to 96% for Intersection over Union (IoU) of 0.5 and satisfactory detection performances up to IoU of 0.8. A Data Analytics GUI service based on QT framework and Darknet-53 engine is also implemented in order to simplify the deployment process and analyse massive amount of images even for people without Data Science expertise.

LGDec 10, 2019
Backprop Diffusion is Biologically Plausible

Alessandro Betti, Marco Gori

The Backpropagation algorithm relies on the abstraction of using a neural model that gets rid of the notion of time, since the input is mapped instantaneously to the output. In this paper, we claim that this abstraction of ignoring time, along with the abrupt input changes that occur when feeding the training set, are in fact the reasons why, in some papers, Backprop biological plausibility is regarded as an arguable issue. We show that as soon as a deep feedforward network operates with neurons with time-delayed response, the backprop weight update turns out to be the basic equation of a biologically plausible diffusion process based on forward-backward waves. We also show that such a process very well approximates the gradient for inputs that are not too fast with respect to the depth of the network. These remarks somewhat disclose the diffusion process behind the backprop equation and leads us to interpret the corresponding algorithm as a degeneration of a more general diffusion process that takes place also in neural networks with cyclic connections.

SPNov 13, 2019
Condition monitoring and early diagnostics methodologies for hydropower plants

Alessandro Betti, Emanuele Crisostomi, Gianluca Paolinelli et al.

Hydropower plants are one of the most convenient option for power generation, as they generate energy exploiting a renewable source, they have relatively low operating and maintenance costs, and they may be used to provide ancillary services, exploiting the large reservoirs of available water. The recent advances in Information and Communication Technologies (ICT) and in machine learning methodologies are seen as fundamental enablers to upgrade and modernize the current operation of most hydropower plants, in terms of condition monitoring, early diagnostics and eventually predictive maintenance. While very few works, or running technologies, have been documented so far for the hydro case, in this paper we propose a novel Key Performance Indicator (KPI) that we have recently developed and tested on operating hydropower plants. In particular, we show that after more than one year of operation it has been able to identify several faults, and to support the operation and maintenance tasks of plant operators. Also, we show that the proposed KPI outperforms conventional multivariable process control charts, like the Hotelling $t_2$ index.

LGOct 22, 2019
A Scalable Predictive Maintenance Model for Detecting Wind Turbine Component Failures Based on SCADA Data

Lorenzo Gigoni, Alessandro Betti, Mauro Tucci et al.

In this work, a novel predictive maintenance system is presented and applied to the main components of wind turbines. The proposed model is based on machine learning and statistical process control tools applied to SCADA (Supervisory Control And Data Acquisition) data of critical components. The test campaign was divided into two stages: a first two years long offline test, and a second one year long real-time test. The offline test used historical faults from six wind farms located in Italy and Romania, corresponding to a total of 150 wind turbines and an overall installed nominal power of 283 MW. The results demonstrate outstanding capabilities of anomaly prediction up to 2 months before device unscheduled downtime. Furthermore, the real-time 12-months test confirms the ability of the proposed system to detect several anomalies, therefore allowing the operators to identify the root causes, and to schedule maintenance actions before reaching a catastrophic stage.

LGOct 8, 2019
A Machine Learning Model for Long-Term Power Generation Forecasting at Bidding Zone Level

Michela Moschella, Mauro Tucci, Emanuele Crisostomi et al.

The increasing penetration level of energy generation from renewable sources is demanding for more accurate and reliable forecasting tools to support classic power grid operations (e.g., unit commitment, electricity market clearing or maintenance planning). For this purpose, many physical models have been employed, and more recently many statistical or machine learning algorithms, and data-driven methods in general, are becoming subject of intense research. While generally the power research community focuses on power forecasting at the level of single plants, in a short future horizon of time, in this time we are interested in aggregated macro-area power generation (i.e., in a territory of size greater than 100000 km^2) with a future horizon of interest up to 15 days ahead. Real data are used to validate the proposed forecasting methodology on a test set of several months.

CVSep 1, 2019
Learning Visual Features Under Motion Invariance

Alessandro Betti, Marco Gori, Stefano Melacci

Humans are continuously exposed to a stream of visual data with a natural temporal structure. However, most successful computer vision algorithms work at image level, completely discarding the precious information carried by motion. In this paper, we claim that processing visual streams naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning that originates from variational principles, just like in physics. Such principled approach is well suited for a discussion on a number of interesting questions that arise in vision, and it offers a well-posed computational scheme for the discovery of convolutional filters over the retina. Differently from traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario for the unsupervised processing of video signals, where features are extracted in a multi-layer architecture with motion invariance. While the theory enables the implementation of novel computer vision systems, it also sheds light on the role of information-based principles to drive possible biological solutions.

LGJul 14, 2019
On the Role of Time in Learning

Alessandro Betti, Marco Gori

By and large the process of learning concepts that are embedded in time is regarded as quite a mature research topic. Hidden Markov models, recurrent neural networks are, amongst others, successful approaches to learning from temporal data. In this paper, we claim that the dominant approach minimizing appropriate risk functions defined over time by classic stochastic gradient might miss the deep interpretation of time given in other fields like physics. We show that a recent reformulation of learning according to the principle of Least Cognitive Action is better suited whenever time is involved in learning. The principle gives rise to a learning process that is driven by differential equations, that can somehow descrive the process within the same framework as other laws of nature.

LGJul 11, 2019
Spatiotemporal Local Propagation

Alessandro Betti, Marco Gori

This paper proposes an in-depth re-thinking of neural computation that parallels apparently unrelated laws of physics, that are formulated in the variational framework of the least action principle. The theory holds for neural networks that are also based on any digraph, and the resulting computational scheme exhibits the intriguing property of being truly biologically plausible. The scheme, which is referred to as SpatioTemporal Local Propagation (STLP), is local in both space and time. Space locality comes from the expression of the network connections by an appropriate Lagrangian term, so as the corresponding computational scheme does not need the backpropagation (BP) of the error, while temporal locality is the outcome of the variational formulation of the problem. Overall, in addition to conquering the often invoked biological plausibility missed by BP, the locality in both space and time that arises from the proposed theory can neither be exhibited by Backpropagation Through Time (BPTT) nor by Real-Time Recurrent Learning (RTRL).

LGJul 4, 2019
Least Action Principles and Well-Posed Learning Problems

Alessandro Betti, Marco Gori

Machine Learning algorithms are typically regarded as appropriate optimization schemes for minimizing risk functions that are constructed on the training set, which conveys statistical flavor to the corresponding learning problem. When the focus is shifted on perception, which is inherently interwound with time, recent alternative formulations of learning have been proposed that rely on the principle of Least Cognitive Action, which very much reminds us of the Least Action Principle in mechanics. In this paper, we discuss different forms of the cognitive action and show the well-posedness of learning. In particular, unlike the special case of the action in mechanics, where the stationarity is typically gained on saddle points, we prove the existence of the minimum of a special form of cognitive action, which yields forth-order differential equations of learning. We also briefly discuss the dissipative behavior of these equations that turns out to characterize the process of learning.

LGFeb 26, 2019
Day-Ahead Hourly Forecasting of Power Generation from Photovoltaic Plants

Lorenzo Gigoni, Alessandro Betti, Emanuele Crisostomi et al.

The ability to accurately forecast power generation from renewable sources is nowadays recognised as a fundamental skill to improve the operation of power systems. Despite the general interest of the power community in this topic, it is not always simple to compare different forecasting methodologies, and infer the impact of single components in providing accurate predictions. In this paper we extensively compare simple forecasting methodologies with more sophisticated ones over 32 photovoltaic plants of different size and technology over a whole year. Also, we try to evaluate the impact of weather conditions and weather forecasts on the prediction of PV power generation.

LGJan 29, 2019
Predictive Maintenance in Photovoltaic Plants with a Big Data Approach

Alessandro Betti, Maria Luisa Lo Trovato, Fabio Salvatore Leonardi et al.

This paper presents a novel and flexible solution for fault prediction based on data collected from SCADA system. Fault prediction is offered at two different levels based on a data-driven approach: (a) generic fault/status prediction and (b) specific fault class prediction, implemented by means of two different machine learning based modules built on an unsupervised clustering algorithm and a Pattern Recognition Neural Network, respectively. Model has been assessed on a park of six photovoltaic (PV) plants up to 10 MW and on more than one hundred inverter modules of three different technology brands. The results indicate that the proposed method is effective in (a) predicting incipient generic faults up to 7 days in advance with sensitivity up to 95% and (b) anticipating damage of specific fault classes with times ranging from few hours up to 7 days. The model is easily deployable for on-line monitoring of anomalies on new PV plants and technologies, requiring only the availability of historical SCADA and fault data, fault taxonomy and inverter electrical datasheet. Keywords: Data Mining, Fault Prediction, Inverter Module, Key Performance Indicator, Lost Production

CVAug 28, 2018
Cognitive Action Laws: The Case of Visual Features

Alessandro Betti, Marco Gori, Stefano Melacci

This paper proposes a theory for understanding perceptual learning processes within the general framework of laws of nature. Neural networks are regarded as systems whose connections are Lagrangian variables, namely functions depending on time. They are used to minimize the cognitive action, an appropriate functional index that measures the agent interactions with the environment. The cognitive action contains a potential and a kinetic term that nicely resemble the classic formulation of regularization in machine learning. A special choice of the functional index, which leads to forth-order differential equations---Cognitive Action Laws (CAL)---exhibits a structure that mirrors classic formulation of machine learning. In particular, unlike the action of mechanics, the stationarity condition corresponds with the global minimum. Moreover, it is proven that typical asymptotic learning conditions on the weights can coexist with the initialization provided that the system dynamics is driven under a policy referred to as information overloading control. Finally, the theory is experimented for the problem of feature extraction in computer vision.

AIAug 21, 2018
Backpropagation and Biological Plausibility

Alessandro Betti, Marco Gori, Giuseppe Marra

By and large, Backpropagation (BP) is regarded as one of the most important neural computation algorithms at the basis of the progress in machine learning, including the recent advances in deep learning. However, its computational structure has been the source of many debates on its arguable biological plausibility. In this paper, it is shown that when framing supervised learning in the Lagrangian framework, while one can see a natural emergence of Backpropagation, biologically plausible local algorithms can also be devised that are based on the search for saddle points in the learning adjoint space composed of weights, neural outputs, and Lagrangian multipliers. This might open the doors to a truly novel class of learning algorithms where, because of the introduction of the notion of support neurons, the optimization scheme also plays a fundamental role in the construction of the architecture.

LGJul 17, 2018
Learning Neuron Non-Linearities with Kernel-Based Deep Neural Networks

Giuseppe Marra, Dario Zanca, Alessandro Betti et al.

The effectiveness of deep neural architectures has been widely supported in terms of both experimental and foundational principles. There is also clear evidence that the activation function (e.g. the rectifier and the LSTM units) plays a crucial role in the complexity of learning. Based on this remark, this paper discusses an optimal selection of the neuron non-linearity in a functional framework that is inspired from classic regularization arguments. It is shown that the best activation function is represented by a kernel expansion in the training set, that can be effectively approximated over an opportune set of points modeling 1-D clusters. The idea can be naturally extended to recurrent networks, where the expressiveness of kernel-based activation functions turns out to be a crucial ingredient to capture long-term dependencies. We give experimental evidence of this property by a set of challenging experiments, where we compare the results with neural architectures based on state of the art LSTM cells.

CVJul 14, 2018
Motion Invariance in Visual Environments

Alessandro Betti, Marco Gori, Stefano Melacci

The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams, just as happens in nature. In this paper, we claim that their processing naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of visual learning based on convolutional features. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by the Euler-Lagrange differential equations derived from the principle of least cognitive action, that parallels laws of mechanics. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. An experimental report of the theory is presented where we show that features extracted under motion invariance yield an improvement that can be assessed by measuring information-based indexes.

LGJul 14, 2018
Generalization in quasi-periodic environments

Giovanni Bellettini, Alessandro Betti, Marco Gori

By and large the behavior of stochastic gradient is regarded as a challenging problem, and it is often presented in the framework of statistical machine learning. This paper offers a novel view on the analysis of on-line models of learning that arises when dealing with a generalized version of stochastic gradient that is based on dissipative dynamics. In order to face the complex evolution of these models, a systematic treatment is proposed which is based on energy balance equations that are derived by means of the Caldirola-Kanai (CK) Hamiltonian. According to these equations, learning can be regarded as an ordering process which corresponds with the decrement of the loss function. Finally, the main results established in this paper is that in the case of quasi-periodic environments, where the pattern novelty is progressively limited as time goes by, the system dynamics yields an asymptotically consistent solution in the weight space, that is the solution maps similar patterns to the same decision.

CVJan 16, 2018
Convolutional Networks in Visual Environments

Alessandro Betti, Marco Gori

The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams. In this paper, we claim that their processing naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning with convolutional networks. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by differential equations derived from the principle of least cognitive action. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. It is pointed out that an opportune blurring of the video, along the interleaving of segments of null signal, make it possible to conceive a novel learning mechanism that yields the minimum of the cognitive action. Basically, while the theory enables the implementation of novel computer vision systems, it is also provides an intriguing explanation of the solution that evolution has discovered for humans, where it looks like that the video blurring in newborns and the day-night rhythm seem to emerge in a general computational framework, regardless of biology.