Rodrigo Verschae

CV
h-index20
6papers
198citations
Novelty52%
AI Score53

6 Papers

32.5CVJun 1
Generalization Limits in Vehicle Re-Identification

Anis Yassine Ben Mabrouk, Antoine Tadros, Rafael Grompone von Gioi et al.

Vehicle re-identification focuses on retrieving images of the same vehicle from a gallery given a query image. Upon closer inspection of commonly used datasets, we observe that vehicles with few visual differences-e.g., the same make, model, and color-appear in both the training and test sets. As a result, methods that effectively memorize the training data tend to perform well on these test sets but struggle to generalize to other datasets. In this paper, we address this issue by proposing a novel evaluation approach that more effectively measures generalization capability to unseen vehicle types. To further study generalization performance, we also propose splitting the evaluation based on view, allowing us to differentiate the effect of viewpoint robustness from that of same-view re-identification. Our findings reveal that most state-of-the-art methods struggle with unseen vehicle types, and that their robustness to viewpoint changes and attention to detail are limited to vehicle types seen during training.

50.2CVApr 29
Event-based Liveness Detection using Temporal Ocular Dynamics: An Exploratory Approach

Nicolas Mastropasqua, Ignacio Bugueno-Cordova, Rodrigo Verschae et al.

Face liveness detection has been extensively studied using RGB cameras, achieving strong performance under controlled conditions but often failing to generalize across sensors and attack scenarios. In this work, we explore event cameras as an alternative sensing modality for liveness detection based on temporal ocular dynamics. Event cameras capture sparse, asynchronous changes in brightness with microsecond resolution, enabling precise analysis of fast eye movements such as saccades. Replay attacks cannot faithfully reproduce these dynamics due to temporal resampling and display artifacts, leading to distinctive spatio-temporal patterns in the event domain. We design a data collection protocol to extend RGBE-Gaze with replay-attack recordings, yielding an event-based fake counterpart for liveness detection. We analyze event-driven temporal features from eye regions and evaluate their effectiveness for ocular motion segmentation and liveness classification. Our results show that event-based representations enable reliable discrimination between genuine and replayed sequences, achieving up to 95.37% top-1 accuracy with a spiking convolutional neural network. These preliminary findings highlight the potential of event-based sensing for robust and low-latency liveness detection.

CVJun 12, 2025
Human-Robot Navigation using Event-based Cameras and Reinforcement Learning

Ignacio Bugueno-Cordova, Javier Ruiz-del-Solar, Rodrigo Verschae

This work introduces a robot navigation controller that combines event cameras and other sensors with reinforcement learning to enable real-time human-centered navigation and obstacle avoidance. Unlike conventional image-based controllers, which operate at fixed rates and suffer from motion blur and latency, this approach leverages the asynchronous nature of event cameras to process visual information over flexible time intervals, enabling adaptive inference and control. The framework integrates event-based perception, additional range sensing, and policy optimization via Deep Deterministic Policy Gradient, with an initial imitation learning phase to improve sample efficiency. Promising results are achieved in simulated environments, demonstrating robust navigation, pedestrian following, and obstacle avoidance. A demo video is available at the project website.

CVAug 16, 2025
Exploring Spatial-Temporal Dynamics in Event-based Facial Micro-Expression Analysis

Nicolas Mastropasqua, Ignacio Bugueno-Cordova, Rodrigo Verschae et al.

Micro-expression analysis has applications in domains such as Human-Robot Interaction and Driver Monitoring Systems. Accurately capturing subtle and fast facial movements remains difficult when relying solely on RGB cameras, due to limitations in temporal resolution and sensitivity to motion blur. Event cameras offer an alternative, with microsecond-level precision, high dynamic range, and low latency. However, public datasets featuring event-based recordings of Action Units are still scarce. In this work, we introduce a novel, preliminary multi-resolution and multi-modal micro-expression dataset recorded with synchronized RGB and event cameras under variable lighting conditions. Two baseline tasks are evaluated to explore the spatial-temporal dynamics of micro-expressions: Action Unit classification using Spiking Neural Networks (51.23\% accuracy with events vs. 23.12\% with RGB), and frame reconstruction using Conditional Variational Autoencoders, achieving SSIM = 0.8513 and PSNR = 26.89 dB with high-resolution event input. These promising results show that event-based data can be used for micro-expression recognition and frame reconstruction.

CVAug 5, 2025
evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition

Rodrigo Verschae, Ignacio Bugueno-Cordova

Event-based cameras are bio-inspired vision sensors that asynchronously capture per-pixel intensity changes with microsecond latency, high temporal resolution, and high dynamic range, providing valuable information about the spatio-temporal dynamics of the scene. In the present work, we propose evTransFER, a transfer learning-based framework and architecture for face expression recognition using event-based cameras. The main contribution is a feature extractor designed to encode the spatio-temporal dynamics of faces, built by training an adversarial generative method on a different problem (facial reconstruction) and then transferring the trained encoder weights to the face expression recognition system. We show that this proposed transfer learning method greatly improves the ability to recognize facial expressions compared to training a network from scratch. In addition, we propose an architecture that incorporates an LSTM to capture longer-term facial expression dynamics, and we introduce a new event-based representation, referred to as TIE, both of which further improve the results. We evaluate the proposed framework on the event-based facial expression database e-CK+ and compare it to state-of-the-art methods. The results show that the proposed framework evTransFER achieves a 93.6\% recognition rate on the e-CK+ database, significantly improving the accuracy (25.9\% points or more) when compared to state-of-the-art performance for similar problems.

CVOct 15, 2018
Deep Photovoltaic Nowcasting

Jinsong Zhang, Rodrigo Verschae, Shohei Nobuhara et al.

Predicting the short-term power output of a photovoltaic panel is an important task for the efficient management of smart grids. Short-term forecasting at the minute scale, also known as nowcasting, can benefit from sky images captured by regular cameras and installed close to the solar panel. However, estimating the weather conditions from these images---sun intensity, cloud appearance and movement, etc.---is a very challenging task that the community has yet to solve with traditional computer vision techniques. In this work, we propose to learn the relationship between sky appearance and the future photovoltaic power output using deep learning. We train several variants of convolutional neural networks which take historical photovoltaic power values and sky images as input and estimate photovoltaic power in a very short term future. In particular, we compare three different architectures based on: a multi-layer perceptron (MLP), a convolutional neural network (CNN), and a long short term memory (LSTM) module. We evaluate our approach quantitatively on a dataset of photovoltaic power values and corresponding images gathered in Kyoto, Japan. Our experiments reveal that the MLP network, already used similarly in previous work, achieves an RMSE skill score of 7% over the commonly-used persistence baseline on the 1-minute future photovoltaic power prediction task. Our CNN-based network improves upon this with a 12% skill score. In contrast, our LSTM-based model, which can learn the temporal dependencies in the data, achieves a 21% RMSE skill score, thus outperforming all other approaches.