LGMar 2, 2022Code
DCT-Former: Efficient Self-Attention with Discrete Cosine TransformCarmelo Scribano, Giorgia Franchini, Marco Prato et al.
Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public
ROFeb 14, 2023Code
Model-Based Underwater 6D Pose Estimation from RGBDavide Sapienza, Elena Govi, Sara Aldhaheri et al.
Object pose estimation underwater allows an autonomous system to perform tracking and intervention tasks. Nonetheless, underwater target pose estimation is remarkably challenging due to, among many factors, limited visibility, light scattering, cluttered environments, and constantly varying water conditions. An approach is to employ sonar or laser sensing to acquire 3D data, however, the data is not clear and the sensors expensive. For this reason, the community has focused on extracting pose estimates from RGB input. In this work, we propose an approach that leverages 2D object detection to reliably compute 6D pose estimates in different underwater scenarios. We test our proposal with 4 objects with symmetrical shapes and poor texture spanning across 33,920 synthetic and 10 real scenes. All objects and scenes are made available in an open-source dataset that includes annotations for object detection and pose estimation. When benchmarking against similar end-to-end methodologies for 6D object pose estimation, our pipeline provides estimates that are 8% more accurate. We also demonstrate the real world usability of our pose estimation pipeline on an underwater robotic manipulator in a reaching task.
ROJul 23, 2024Code
A Simulation Benchmark for Autonomous Racing with Large-Scale Human DataAdrian Remonda, Nicklas Hansen, Ayoub Raji et al.
Despite the availability of international prize-money competitions, scaled vehicles, and simulation environments, research on autonomous racing and the control of sports cars operating close to the limit of handling has been limited by the high costs of vehicle acquisition and management, as well as the limited physics accuracy of open-source simulators. In this paper, we propose a racing simulation platform based on the simulator Assetto Corsa to test, validate, and benchmark autonomous driving algorithms, including reinforcement learning (RL) and classical Model Predictive Control (MPC), in realistic and challenging scenarios. Our contributions include the development of this simulation platform, several state-of-the-art algorithms tailored to the racing environment, and a comprehensive dataset collected from human drivers. Additionally, we evaluate algorithms in the offline RL setting. All the necessary code (including environment and benchmarks), working examples, datasets, and videos are publicly released and can be found at: https://assetto-corsa-gym.github.io
CVOct 3, 2022Code
CERBERUS: Simple and Effective All-In-One Automotive Perception Model with Multi Task LearningCarmelo Scribano, Giorgia Franchini, Ignacio Sañudo Olmedo et al.
Perceiving the surrounding environment is essential for enabling autonomous or assisted driving functionalities. Common tasks in this domain include detecting road users, as well as determining lane boundaries and classifying driving conditions. Over the last few years, a large variety of powerful Deep Learning models have been proposed to address individual tasks of camera-based automotive perception with astonishing performances. However, the limited capabilities of in-vehicle embedded computing platforms cannot cope with the computational effort required to run a heavy model for each individual task. In this work, we present CERBERUS (CEnteR Based End-to-end peRception Using a Single model), a lightweight model that leverages a multitask-learning approach to enable the execution of multiple perception tasks at the cost of a single inference. The code will be made publicly available at https://github.com/cscribano/CERBERUS
ROJul 22, 2022
Motion Planning and Control for Multi Vehicle Autonomous Racing at High SpeedsAyoub Raji, Alexander Liniger, Andrea Giove et al.
This paper presents a multi-layer motion planning and control architecture for autonomous racing, capable of avoiding static obstacles, performing active overtakes, and reaching velocities above 75 $m/s$. The used offline global trajectory generation and the online model predictive controller are highly based on optimization and dynamic models of the vehicle, where the tires and camber effects are represented in an extended version of the basic Pacejka Magic Formula. The proposed single-track model is identified and validated using multi-body motorsport libraries which allow simulating the vehicle dynamics properly, especially useful when real experimental data are missing. The fundamental regularization terms and constraints of the controller are tuned to reduce the rate of change of the inputs while assuring an acceptable velocity and path tracking. The motion planning strategy consists of a Frenét-Frame-based planner which considers a forecast of the opponent produced by a Kalman filter. The planner chooses the collision-free path and velocity profile to be tracked on a 3 seconds horizon to realize different goals such as following and overtaking. The proposed solution has been applied on a Dallara AV-21 racecar and tested at oval race tracks achieving lateral accelerations up to 25 $m/s^{2}$.
ROOct 27, 2023
er.autopilot 1.0: The Full Autonomous Stack for Oval Racing at High SpeedsAyoub Raji, Danilo Caporale, Francesco Gatti et al.
The Indy Autonomous Challenge (IAC) brought together for the first time in history nine autonomous racing teams competing at unprecedented speed and in head-to-head scenario, using independently developed software on open-wheel racecars. This paper presents the complete software architecture used by team TII EuroRacing (TII-ER), covering all the modules needed to avoid static obstacles, perform active overtakes and reach speeds above 75 m/s (270 km/h). In addition to the most common modules related to perception, planning, and control, we discuss the approaches used for vehicle dynamics modelling, simulation, telemetry, and safety. Overall results and the performance of each module are described, as well as the lessons learned during the first two events of the competition on oval tracks, where the team placed respectively second and third.
CVApr 17, 2023
Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose EstimationElena Govi, Davide Sapienza, Carmelo Scribano et al.
In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effectiveness of a machine learning method. Our case study aims precisely to analyze the Linemod dataset, considered the state of the art in 6D pose estimation context. That dataset presents images accompanied by ArUco markers; it is evident that such markers will not be available in real-world contexts. We analyze how the presence of the markers affects the pose estimation accuracy, and how this bias may be mitigated through data augmentation and other methods. Our work aims to show how the presence of these markers goes to modify, in the testing phase, the effectiveness of the deep learning method used. In particular, we will demonstrate, through the tool of saliency maps, how the focus of the neural network is captured in part by these ArUco markers. Finally, a new dataset, obtained by applying geometric tools to Linemod, will be proposed in order to demonstrate our hypothesis and uncovering the bias. Our results demonstrate the potential for bias in 6DOF pose estimation networks, and suggest methods for reducing this bias when training with markers.
36.2CVMay 21
Accelerating Vision Foundation Models with Drop-in Depthwise ConvolutionCarmelo Scribano, Mohammad Mahdi, Nedyalko Prisadnikov et al.
Pretrained vision foundation models deliver strong performance across tasks with limited fine-tuning. However, their Vision Transformer (ViT) backbones impose high inference costs, limiting deployment on resource-constrained devices. In this work, we accelerate large-scale pretrained ViTs while preserving their feature extraction capabilities by exploiting the intrinsic convolution-like behavior of some attention heads. Specifically, we introduce an efficient depthwise convolution-based layer that serves as a drop-in replacement for these heads. Additionally, we propose simple strategies to identify which heads can be replaced and introduce a fine-tuning procedure that recovers downstream task performance. Across both image classification and segmentation tasks, our method achieves 17-20\% percent inference speedup with minimal performance degradation. We validate the approach through detailed derivations, extensive experiments, and efficiency benchmarks. The reference implementation is publicly available.
24.1ROApr 3
RAGE: A Tightly Coupled Radar-Aided Grip Estimator For Autonomous Race CarsDavide Malvezzi, Nicola Musiu, Eugenio Mascaro et al.
Real-time estimation of vehicle-tire-road friction is critical for allowing autonomous race cars to safely and effectively operate at their physical limits. Traditional approaches to measure tire grip often depend on costly, specialized sensors that require custom installation, limiting scalability and deployment. In this work, we introduce RAGE, a novel real-time estimator that simultaneously infers the vehicle velocity, slip angles of the tires and the lateral forces that act on them, using only standard sensors, such as IMUs and RADARs, which are commonly available on most of modern autonomous platforms. We validate our approach through both high-fidelity simulations and real-world experiments conducted on the EAV-24 autonomous race car, demonstrating the accuracy and effectiveness of our method in estimating the vehicle lateral dynamics.
RODec 30, 2025
Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing StackGiovanni Lambertini, Matteo Pini, Eugenio Mascaro et al.
In this paper, we describe the automated simulation and reporting pipeline implemented for our autonomous racing stack, ur.autopilot. The backbone of the simulation is based on a high-fidelity model of the vehicle interfaced as a Functional Mockup Unit (FMU). The pipeline can execute the software stack and the simulation up to three times faster than real-time, locally or on GitHub for Continuous Integration/- Continuous Delivery (CI/CD). As the most important input of the pipeline, there is a set of running scenarios. Each scenario allows the initialization of the ego vehicle in different initial conditions (position and speed), as well as the initialization of any other configuration of the stack. This functionality is essential to validate efficiently critical modules, like the one responsible for high-speed overtaking maneuvers or localization, which are among the most challenging aspects of autonomous racing. Moreover, we describe how we implemented a fault injection module, capable of introducing sensor delays and perturbations as well as modifying outputs of any node of the stack. Finally, we describe the design of our automated reporting process, aimed at maximizing the effectiveness of the simulation analysis.
CVJun 18, 2021Code
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal TransformersCarmelo Scribano, Davide Sapienza, Giorgia Franchini et al.
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.
ROJan 24, 2019Code
F1/10: An Open-Source Autonomous Cyber-Physical PlatformMatthew O'Kelly, Varundev Sukhil, Houssam Abbas et al.
In 2005 DARPA labeled the realization of viable autonomous vehicles (AVs) a grand challenge; a short time later the idea became a moonshot that could change the automotive industry. Today, the question of safety stands between reality and solved. Given the right platform the CPS community is poised to offer unique insights. However, testing the limits of safety and performance on real vehicles is costly and hazardous. The use of such vehicles is also outside the reach of most researchers and students. In this paper, we present F1/10: an open-source, affordable, and high-performance 1/10 scale autonomous vehicle testbed. The F1/10 testbed carries a full suite of sensors, perception, planning, control, and networking software stacks that are similar to full scale solutions. We demonstrate key examples of the research enabled by the F1/10 testbed, and how the platform can be used to augment research and education in autonomous systems, making autonomy more accessible.
14.0CVMay 4
Low-Latency Embedded Driver Monitoring System with a Multi-Task Neural NetworkCarmelo Scribano, Giovanni Cappelletti, Elia Giacobazzi et al.
Road traffic accidents remain a significant global concern, with the majority attributed to human factors such as driver distraction and fatigue. This study proposes a camera-based approach to derive useful indicators to assess driver attentiveness and alertness. The proposed pipeline jointly satisfies the stringent real-time requirements imposed by the critical application and minimizes the computational requirements to allow for deployment on a tight computational budget. To this end, we develop a lightweight multi-task neural network that predicts multiple indicators for the face region in a single forward pass. The developed model is integrated into a complete execution workflow to produce a real-time estimate of attentiveness, fatigue, and engagement in distracting activities.
SYDec 22, 2023
A Tricycle Model to Accurately Control an Autonomous Racecar with Locked DifferentialAyoub Raji, Nicola Musiu, Alessandro Toschi et al.
In this paper, we present a novel formulation to model the effects of a locked differential on the lateral dynamics of an autonomous open-wheel racecar. The model is used in a Model Predictive Controller in which we included a micro-steps discretization approach to accurately linearize the dynamics and produce a prediction suitable for real-time implementation. The stability analysis of the model is presented, as well as a brief description of the overall planning and control scheme which includes an offline trajectory generation pipeline, an online local speed profile planner, and a low-level longitudinal controller. An improvement of the lateral path tracking is demonstrated in preliminary experimental results that have been produced on a Dallara AV-21 during the first Indy Autonomous Challenge event on the Monza F1 racetrack. Final adjustments and tuning have been performed in a high-fidelity simulator demonstrating the effectiveness of the solution when performing close to the tire limits.
7.4ROApr 9
RAGE-XY: RADAR-Aided Longitudinal and Lateral Forces Estimation For Autonomous Race CarsDavide Malvezzi, Nicola Musiu, Eugenio Mascaro et al.
In this work, we present RAGE-XY, an extended version of RAGE, a real-time estimation framework that simultaneously infers vehicle velocity, tire slip angles, and the forces acting on the vehicle using only standard onboard sensors such as IMUs and RADARs. Compared to the original formulation, the proposed method incorporates an online RADAR calibration module, improving the accuracy of lateral velocity estimation in the presence of sensor misalignment. Furthermore, we extend the underlying vehicle model from a single-track approximation to a tricycle model, enabling the estimation of rear longitudinal tire forces in addition to lateral dynamics. We validate the proposed approach through both high-fidelity simulations and real-world experiments conducted on the EAV-24 autonomous race car, demonstrating improved accuracy and robustness in estimating both lateral and longitudinal vehicle dynamics.
CVJun 19, 2025
Segment Anything for Satellite Imagery: A Strong Baseline and a Regional Dataset for Automatic Field DelineationCarmelo Scribano, Elena Govi, Paolo Bertellini et al.
Accurate mapping of agricultural field boundaries is essential for the efficient operation of agriculture. Automatic extraction from high-resolution satellite imagery, supported by computer vision techniques, can avoid costly ground surveys. In this paper, we present a pipeline for field delineation based on the Segment Anything Model (SAM), introducing a fine-tuning strategy to adapt SAM to this task. In addition to using published datasets, we describe a method for acquiring a complementary regional dataset that covers areas beyond current sources. Extensive experiments assess segmentation accuracy and evaluate the generalization capabilities. Our approach provides a robust baseline for automated field delineation. The new regional dataset, known as ERAS, is now publicly available.
LGMay 24, 2019
Additive Noise Annealing and Approximation Properties of Quantized Neural NetworksMatteo Spallanzani, Lukas Cavigelli, Gian Paolo Leonardi et al.
We present a theoretical and experimental investigation of the quantization problem for artificial neural networks. We provide a mathematical definition of quantized neural networks and analyze their approximation capabilities, showing in particular that any Lipschitz-continuous map defined on a hypercube can be uniformly approximated by a quantized neural network. We then focus on the regularization effect of additive noise on the arguments of multi-step functions inherent to the quantization of continuous variables. In particular, when the expectation operator is applied to a non-differentiable multi-step random function, and if the underlying probability density is differentiable (in either classical or weak sense), then a differentiable function is retrieved, with explicit bounds on its Lipschitz constant. Based on these results, we propose a novel gradient-based training algorithm for quantized neural networks that generalizes the straight-through estimator, acting on noise applied to the network's parameters. We evaluate our algorithm on the CIFAR-10 and ImageNet image classification benchmarks, showing state-of-the-art performance on AlexNet and MobileNetV2 for ternary networks.