Alessia Bertugli

CV
5papers
133citations
Novelty52%
AI Score25

5 Papers

LGJan 28, 2021
Generalising via Meta-Examples for Continual Learning in the Wild

Alessia Bertugli, Stefano Vincenzi, Simone Calderara et al.

Future deep learning systems call for techniques that can deal with the evolving nature of temporal data and scarcity of annotations when new problems occur. As a step towards this goal, we present FUSION (Few-shot UnSupervIsed cONtinual learning), a learning strategy that enables a neural network to learn quickly and continually on streams of unlabelled data and unbalanced tasks. The objective is to maximise the knowledge extracted from the unlabelled data stream (unsupervised), favor the forward transfer of previously learnt tasks and features (continual) and exploit as much as possible the supervised information when available (few-shot). The core of FUSION is MEML - Meta-Example Meta-Learning - that consolidates a meta-representation through the use of a self-attention mechanism during a single inner loop in the meta-optimisation stage. To further enhance the capability of MEML to generalise from few data, we extend it by creating various augmented surrogate tasks and by optimising over the hardest. An extensive experimental evaluation on public computer vision benchmarks shows that FUSION outperforms existing state-of-the-art solutions both in the few-shot and continual learning experimental settings.

LGSep 17, 2020
Few-Shot Unsupervised Continual Learning through Meta-Examples

Alessia Bertugli, Stefano Vincenzi, Simone Calderara et al.

In real-world applications, data do not reflect the ones commonly used for neural networks training, since they are usually few, unlabeled and can be available as a stream. Hence many existing deep learning solutions suffer from a limited range of applications, in particular in the case of online streaming data that evolve over time. To narrow this gap, in this work we introduce a novel and complex setting involving unsupervised meta-continual learning with unbalanced tasks. These tasks are built through a clustering procedure applied to a fitted embedding space. We exploit a meta-learning scheme that simultaneously alleviates catastrophic forgetting and favors the generalization to new tasks. Moreover, to encourage feature reuse during the meta-optimization, we exploit a single inner loop taking advantage of an aggregated representation achieved through the use of a self-attention mechanism. Experimental results on few-shot learning benchmarks show competitive performance even compared to the supervised case. Additionally, we empirically observe that in an unsupervised scenario, the small tasks and the variability in the clusters pooling play a crucial role in the generalization capability of the network. Further, on complex datasets, the exploitation of more clusters than the true number of classes leads to higher results, even compared to the ones obtained with full supervision, suggesting that a predefined partitioning into classes can miss relevant structural information.

CVMay 26, 2020
DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Alessio Monti, Alessia Bertugli, Simone Calderara et al.

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address the aforementioned aspects by proposing a new recurrent generative model that considers both single agents' future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and to integrate it with data about agents' possible future objectives. Our proposal is general enough to be applied to different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

CVMay 17, 2020
AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

Alessia Bertugli, Simone Calderara, Pasquale Coscia et al.

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications. A key component of this task is represented by the inherently multi-modal nature of human paths which makes socially acceptable multiple futures when human interactions are involved. To this end, we propose a generative architecture for multi-future trajectory predictions based on Conditional Variational Recurrent Neural Networks (C-VRNNs). Conditioning mainly relies on prior belief maps, representing most likely moving directions and forcing the model to consider past observed dynamics in generating future positions. Human interactions are modeled with a graph-based attention mechanism enabling an online attentive hidden state refinement of the recurrent estimation. To corroborate our model, we perform extensive experiments on publicly-available datasets (e.g., ETH/UCY, Stanford Drone Dataset, STATS SportVU NBA, Intersection Drone Dataset and TrajNet++) and demonstrate its effectiveness in crowded scenes compared to several state-of-the-art methods.

ROAug 8, 2019
Learning to Grasp from 2.5D images: a Deep Reinforcement Learning Approach

Alessia Bertugli, Paolo Galeone

In this paper, we propose a deep reinforcement learning (DRL) solution to the grasping problem using 2.5D images as the only source of information. In particular, we developed a simulated environment where a robot equipped with a vacuum gripper has the aim of reaching blocks with planar surfaces. These blocks can have different dimensions, shapes, position and orientation. Unity 3D allowed us to simulate a real-world setup, where a depth camera is placed in a fixed position and the stream of images is used by our policy network to learn how to solve the task. We explored different DRL algorithms and problem configurations. The experiments demonstrated the effectiveness of the proposed DRL algorithm applied to grasp tasks guided by visual depth camera inputs. When using the proper policy, the proposed method estimates a robot tool configuration that reaches the object surface with negligible position and orientation errors. This is, to the best of our knowledge, the first successful attempt of using 2.5D images only as of the input of a DRL algorithm, to solve the grasping problem regressing 3D world coordinates.