Chiara Bartolozzi

CV
h-index31
20papers
2,822citations
Novelty40%
AI Score55

20 Papers

CVMay 26
Event-based Motion & Appearance Fusion for 6D Object Pose Tracking

Zhichao Li, Chiara Bartolozzi, Lorenzo Natale et al.

Object pose tracking is a fundamental and essential task for robotics to perform tasks in the home and industrial settings. The most commonly used sensors to do so are RGB-D cameras, which can hit limitations in highly dynamic environments due to motion blur and frame-rate constraints. Event cameras have remarkable features such as high temporal resolution and low latency, which make them a potentially ideal vision sensors for object pose tracking at high speed. Even so, there are still only few works on 6D pose tracking with event cameras. In this work, we take advantage of the high temporal resolution and propose a method that uses both a propagation step fused with a pose correction strategy. Specifically, we use 6D object velocity obtained from event-based optical flow for pose propagation, after which, a template-based local pose correction module is utilized for pose correction. Our learning-free method has comparable performance to the state-of-the-art algorithms, and in some cases out performs them for fast-moving objects. The results indicate the potential for using event cameras in highly-dynamic scenarios where the use of deep network approaches are limited by low update rates.

NCApr 19
NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

Anthony Zador, Jean-Marc Fellous, Terrence Sejnowski et al. · uw

Neuroscience and Artificial Intelligence (AI) have made impressive progress in recent years but remain only loosely interconnected. Based on a workshop convened by the National Science Foundation in August 2025, we identify three fundamental capability gaps in current AI: the inability to interact with the physical world, inadequate learning that produces brittle systems, and unsustainable energy and data inefficiency. We describe the neuroscience principles that address each: co-design of body and controller, prediction through interaction, multi-scale learning with neuromodulatory control, hierarchical distributed architectures, and sparse event-driven computation. We present a research roadmap organized around these principles at near, mid, and long-term horizons. We argue that realizing this program requires a new generation of researchers trained across the boundary between neuroscience and engineering, and describe the institutional conditions: interdisciplinary training, hardware access, community standards, and ethics, needed to support them. We conclude that NeuroAI, neuroscience-informed artificial intelligence, has the potential to overcome limitations of current AI while deepening our understanding of biological neural computation.

AIApr 10, 2023
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Jason Yik, Korneel Van den Berghe, Douwe den Blanken et al. · eth-zurich

Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of researchers across industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we outline tasks and guidelines for benchmarks across multiple application domains, and present initial performance baselines across neuromorphic and conventional approaches for both benchmark tracks. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.

CVMay 30, 2022
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware

Simon F Muller-Cleve, Vittorio Fra, Lyes Khacef et al.

Spatio-temporal pattern recognition is a fundamental ability of the brain which is required for numerous real-world activities. Recent deep learning approaches have reached outstanding accuracies in such tasks, but their implementation on conventional embedded solutions is still very computationally and energy expensive. Tactile sensing in robotic applications is a representative example where real-time processing and energy efficiency are required. Following a brain-inspired computing approach, we propose a new benchmark for spatio-temporal tactile pattern recognition at the edge through Braille letter reading. We recorded a new Braille letters dataset based on the capacitive tactile sensors of the iCub robot's fingertip. We then investigated the importance of spatial and temporal information as well as the impact of event-based encoding on spike-based computation. Afterward, we trained and compared feedforward and recurrent Spiking Neural Networks (SNNs) offline using Backpropagation Through Time (BPTT) with surrogate gradients, then we deployed them on the Intel Loihi neuromorphic chip for fast and efficient inference. We compared our approach to standard classifiers, in particular to the Long Short-Term Memory (LSTM) deployed on the embedded NVIDIA Jetson GPU, in terms of classification accuracy, power, energy consumption, and delay. Our results show that the LSTM reaches ~97% of accuracy, outperforming the recurrent SNN by ~17% when using continuous frame-based data instead of event-based inputs. However, the recurrent SNN on Loihi with event-based inputs is ~500 times more energy-efficient than the LSTM on Jetson, requiring a total power of only ~30 mW. This work proposes a new benchmark for tactile sensing and highlights the challenges and opportunities of event-based encoding, neuromorphic hardware, and spike-based computing for spatio-temporal pattern recognition at the edge.

CVFeb 27, 2023
Fast Trajectory End-Point Prediction with Event Cameras for Reactive Robot Control

Marco Monforte, Luna Gava, Massimiliano Iacono et al.

Prediction skills can be crucial for the success of tasks where robots have limited time to act or joints actuation power. In such a scenario, a vision system with a fixed, possibly too low, sampling rate could lead to the loss of informative points, slowing down prediction convergence and reducing the accuracy. In this paper, we propose to exploit the low latency, motion-driven sampling, and data compression properties of event cameras to overcome these issues. As a use-case, we use a Panda robotic arm to intercept a ball bouncing on a table. To predict the interception point, we adopt a Stateful LSTM network, a specific LSTM variant without fixed input length, which perfectly suits the event-driven paradigm and the problem at hand, where the length of the trajectory is not defined. We train the network in simulation to speed up the dataset acquisition and then fine-tune the models on real trajectories. Experimental results demonstrate how using a dense spatial sampling (i.e. event cameras) significantly increases the number of intercepted trajectories as compared to a fixed temporal sampling (i.e. frame-based cameras).

ROMay 16, 2022
PUCK: Parallel Surface and Convolution-kernel Tracking for Event-Based Cameras

Luna Gava, Marco Monforte, Massimiliano Iacono et al.

Low latency and accuracy are fundamental requirements when vision is integrated in robots for high-speed interaction with targets, since they affect system reliability and stability. In such a scenario, the choice of the sensor and algorithms is important for the entire control loop. The technology of event-cameras can guarantee fast visual sensing in dynamic environments, but requires a tracking algorithm that can keep up with the high data rate induced by the robot ego-motion while maintaining accuracy and robustness to distractors. In this paper, we introduce a novel tracking method that leverages the Exponential Reduced Ordinal Surface (EROS) data representation to decouple event-by-event processing and tracking computation. The latter is performed using convolution kernels to detect and follow a circular target moving on a plane. To benchmark state-of-the-art event-based tracking, we propose the task of tracking the air hockey puck sliding on a surface, with the future aim of controlling the iCub robot to reach the target precisely and on time. Experimental results demonstrate that our algorithm achieves the best compromise between low latency and tracking accuracy both when the robot is still and when moving.

ROJan 29
Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning

Irene Ambrosini, Ingo Blakowski, Dmitrii Zendrikov et al.

Air hockey demands split-second decisions at high puck velocities, a challenge we address with a compact network of spiking neurons running on a mixed-signal analog/digital neuromorphic processor. By co-designing hardware and learning algorithms, we train the system to achieve successful puck interactions through reinforcement learning in a remarkably small number of trials. The network leverages fixed random connectivity to capture the task's temporal structure and adopts a local e-prop learning rule in the readout layer to exploit event-driven activity for fast and efficient learning. The result is real-time learning with a setup comprising a computer and the neuromorphic chip in-the-loop, enabling practical training of spiking neural networks for robotic autonomous systems. This work bridges neuroscience-inspired hardware with real-world robotic control, showing that brain-inspired approaches can tackle fast-paced interaction tasks while supporting always-on learning in intelligent machines.

ROApr 15
Neuromorphic Spiking Ring Attractor for Proprioceptive Joint-State Estimation

Federica Ferrari, Flavia Davidhi, Bernard Maacaron et al.

Maintaining stable internal representations of continuous variables is fundamental for effective robotic control. Continuous attractor networks provide a biologically inspired mechanism for encoding such variables, yet neuromorphic realizations have rarely addressed proprioceptive estimation under resource constraints. This work introduces a spiking ring-attractor network representing a robot joint angle through self-sustaining population activity. Local excitation and broad inhibition support a stable activity bump, while velocity-modulated asymmetries drive its translation and boundary conditions confine motion within mechanical limits. The network reproduces smooth trajectory tracking and remains stable near joint limits, showing reduced drift and improved accuracy compared to unbounded models. Such compact hardware-compatible implementation preserves multi-second stability demonstrating a near-linear relationship between bump velocity and synaptic modulation.

CVOct 9, 2025Code
GraphEnet: Event-driven Human Pose Estimation with a Graph Neural Network

Gaurvi Goyal, Pham Cong Thuong, Arren Glover et al.

Human Pose Estimation is a crucial module in human-machine interaction applications and, especially since the rise in deep learning technology, robust methods are available to consumers using RGB cameras and commercial GPUs. On the other hand, event-based cameras have gained popularity in the vision research community for their low latency and low energy advantages that make them ideal for applications where those resources are constrained like portable electronics and mobile robots. In this work we propose a Graph Neural Network, GraphEnet, that leverages the sparse nature of event camera output, with an intermediate line based event representation, to estimate 2D Human Pose of a single person at a high frequency. The architecture incorporates a novel offset vector learning paradigm with confidence based pooling to estimate the human pose. This is the first work that applies Graph Neural Networks to event data for Human Pose Estimation. The code is open-source at https://github.com/event-driven-robotics/GraphEnet-NeVi-ICCV2025.

CVOct 8, 2025
Lattice-allocated Real-time Line Segment Feature Detection and Tracking Using Only an Event-based Camera

Mikihiro Ikura, Arren Glover, Masayoshi Mizuno et al.

Line segment extraction is effective for capturing geometric features of human-made environments. Event-based cameras, which asynchronously respond to contrast changes along edges, enable efficient extraction by reducing redundant data. However, recent methods often rely on additional frame cameras or struggle with high event rates. This research addresses real-time line segment detection and tracking using only a modern, high-resolution (i.e., high event rate) event-based camera. Our lattice-allocated pipeline consists of (i) velocity-invariant event representation, (ii) line segment detection based on a fitting score, (iii) and line segment tracking by perturbating endpoints. Evaluation using ad-hoc recorded dataset and public datasets demonstrates real-time performance and higher accuracy compared to state-of-the-art event-only and event-frame hybrid baselines, enabling fully stand-alone event camera operation in real-world settings.

CVFeb 10, 2025
Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Giulia D Angelo, Victoria Clerico, Chiara Bartolozzi et al.

Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.

CVAug 20, 2025
6-DoF Object Tracking with Event-based Optical Flow and Frames

Zhichao Li, Arren Glover, Chiara Bartolozzi et al.

Tracking the position and orientation of objects in space (i.e., in 6-DoF) in real time is a fundamental problem in robotics for environment interaction. It becomes more challenging when objects move at high-speed due to frame rate limitations in conventional cameras and motion blur. Event cameras are characterized by high temporal resolution, low latency and high dynamic range, that can potentially overcome the impacts of motion blur. Traditional RGB cameras provide rich visual information that is more suitable for the challenging task of single-shot object pose estimation. In this work, we propose using event-based optical flow combined with an RGB based global object pose estimator for 6-DoF pose tracking of objects at high-speed, exploiting the core advantages of both types of vision sensors. Specifically, we propose an event-based optical flow algorithm for object motion measurement to implement an object 6-DoF velocity tracker. By integrating the tracked object 6-DoF velocity with low frequency estimated pose from the global pose estimator, the method can track pose when objects move at high-speed. The proposed algorithm is tested and validated on both synthetic and real world data, demonstrating its effectiveness, especially in high-speed motion scenarios.

CVMay 24, 2021
LuvHarris: A Practical Corner Detector for Event-cameras

Arren Glover, Aiko Dinale, Leandro De Souza Rosa et al.

There have been a number of corner detection methods proposed for event cameras in the last years, since event-driven computer vision has become more accessible. Current state-of-the-art have either unsatisfactory accuracy or real-time performance when considered for practical use, for example when a camera is randomly moved in an unconstrained environment. In this paper, we present yet another method to perform corner detection, dubbed look-up event-Harris (luvHarris), that employs the Harris algorithm for high accuracy but manages an improved event throughput. Our method has two major contributions, 1. a novel "threshold ordinal event-surface" that removes certain tuning parameters and is well suited for Harris operations, and 2. an implementation of the Harris algorithm such that the computational load per event is minimised and computational heavy convolutions are performed only "as-fast-as-possible", i.e. only as computational resources are available. The result is a practical, real-time, and robust corner detector that runs more than 2.6x the speed of current state-of-the-art; a necessity when using high-resolution event-camera in real-time. We explain the considerations taken for the approach, compare the algorithm to current state-of-the-art in terms of computational performance and detection accuracy, and discuss the validity of the proposed approach for event cameras.

ROMay 5, 2021
iCub

Lorenzo Natale, Chiara Bartolozzi, Francesco Nori et al.

In this chapter we describe the history and evolution of the iCub humanoid platform. We start by describing the first version as it was designed during the RobotCub EU project and illustrate how it evolved to become the platform that is adopted by more than 30 laboratories world wide. We complete the chapter by illustrating some of the research activities that are currently carried out on the iCub robot, i.e. visual perception, event driven sensing and dynamic control. We conclude the Chapter with a discussion of the lessons we learned and a preview of the upcoming next release of the robot, iCub 3.0.

ETSep 1, 2020
Closed-loop spiking control on a neuromorphic processor implemented on the iCub

Jingyue Zhao, Nicoletta Risi, Marco Monforte et al.

Despite neuromorphic engineering promises the deployment of low latency, adaptive and low power systems that can lead to the design of truly autonomous artificial agents, the development of a fully neuromorphic artificial agent is still missing. While neuromorphic sensing and perception, as well as decision-making systems, are now mature, the control and actuation part is lagging behind. In this paper, we present a closed-loop motor controller implemented on mixed-signal analog-digital neuromorphic hardware using a spiking neural network. The network performs a proportional control action by encoding target, feedback, and error signals using a spiking relational network. It continuously calculates the error through a connectivity pattern, which relates the three variables by means of feed-forward connections. Recurrent connections within each population are used to speed up the convergence, decrease the effect of mismatch and improve selectivity. The neuromorphic motor controller is interfaced with the iCub robot simulator. We tested our spiking P controller in a single joint control task, specifically for the robot head yaw. The spiking controller sends the target positions, reads the motor state from its encoder, and sends back the motor commands to the joint. The performance of the spiking controller is tested in a step response experiment and in a target pursuit task. In this work, we optimize the network structure to make it more robust to noisy inputs and device mismatch, which leads to better control performances.

CVJan 5, 2020
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing Trajectories

Marco Monforte, Ander Arriandiaga, Arren Glover et al.

This paper investigates trajectory prediction for robotics, to improve the interaction of robots with moving targets, such as catching a bouncing ball. Unexpected, highly-non-linear trajectories cannot easily be predicted with regression-based fitting procedures, therefore we apply state of the art machine learning, specifically based on Long-Short Term Memory (LSTM) architectures. In addition, fast moving targets are better sensed using event cameras, which produce an asynchronous output triggered by spatial change, rather than at fixed temporal intervals as with traditional cameras. We investigate how LSTM models can be adapted for event camera data, and in particular look at the benefit of using asynchronously sampled data.

ASDec 5, 2019
Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven Cameras

Ander Arriandiaga, Giovanni Morrone, Luca Pasa et al.

We propose a method to address audio-visual target speaker enhancement in multi-talker environments using event-driven cameras. State of the art audio-visual speech separation methods shows that crucial information is the movement of the facial landmarks related to speech production. However, all approaches proposed so far work offline, using frame-based video input, making it difficult to process an audio-visual signal with low latency, for online applications. In order to overcome this limitation, we propose the use of event-driven cameras and exploit compression, high temporal resolution and low latency, for low cost and low latency motion feature extraction, going towards online embedded audio-visual speech processing. We use the event-driven optical flow estimation of the facial landmarks as input to a stacked Bidirectional LSTM trained to predict an Ideal Amplitude Mask that is then used to filter the noisy audio, to obtain the audio signal of the target speaker. The presented approach performs almost on par with the frame-based approach, with very low latency and computational cost.

CVDec 3, 2019
ATIS + SpiNNaker: a Fully Event-based Visual Tracking Demonstration

Arren Glover, Alan B. Stokes, Steve Furber et al.

The Asynchronous Time-based Image Sensor (ATIS) and the Spiking Neural Network Architecture (SpiNNaker) are both neuromorphic technologies that "unconventionally" use binary spikes to represent information. The ATIS produces spikes to represent the change in light falling on the sensor, and the SpiNNaker is a massively parallel computing platform that asynchronously sends spikes between cores for processing. In this demonstration we show these two hardware used together to perform a visual tracking task. We aim to show the hardware and software architecture that integrates the ATIS and SpiNNaker together in a robot middle-ware that makes processing agnostic to the platform (CPU or SpiNNaker). We also aim to describe the algorithm, why it is suitable for the "unconventional" sensor and processing platform including the advantages as well as challenges faced.

CVApr 17, 2019
Event-based Vision: A Survey

Guillermo Gallego, Tobi Delbruck, Garrick Orchard et al.

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

CVJun 27, 2017
Independent Motion Detection with Event-driven Cameras

Valentina Vasco, Arren Glover, Elias Mueggler et al.

Unlike standard cameras that send intensity images at a constant frame rate, event-driven cameras asynchronously report pixel-level brightness changes, offering low latency and high temporal resolution (both in the order of micro-seconds). As such, they have great potential for fast and low power vision algorithms for robots. Visual tracking, for example, is easily achieved even for very fast stimuli, as only moving objects cause brightness changes. However, cameras mounted on a moving robot are typically non-stationary and the same tracking problem becomes confounded by background clutter events due to the robot ego-motion. In this paper, we propose a method for segmenting the motion of an independently moving object for event-driven cameras. Our method detects and tracks corners in the event stream and learns the statistics of their motion as a function of the robot's joint velocities when no independently moving objects are present. During robot operation, independently moving objects are identified by discrepancies between the predicted corner velocities from ego-motion and the measured corner velocities. We validate the algorithm on data collected from the neuromorphic iCub robot. We achieve a precision of ~ 90 % and show that the method is robust to changes in speed of both the head and the target.