ROJul 23, 2024Code
PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous VehiclesAws Khalil, Jaerock Kwon
This study introduces the Perception Latency Mitigation Network (PLM-Net), a novel deep learning approach for addressing perception latency in vision-based Autonomous Vehicle (AV) lateral control systems. Perception latency is the delay between capturing the environment through vision sensors (e.g., cameras) and applying an action (e.g., steering). This issue is understudied in both classical and neural-network-based control methods. Reducing this latency with powerful GPUs and FPGAs is possible but impractical for automotive platforms. PLM-Net comprises the Base Model (BM) and the Timed Action Prediction Model (TAPM). BM represents the original Lane Keeping Assist (LKA) system, while TAPM predicts future actions for different latency values. By integrating these models, PLM-Net mitigates perception latency. The final output is determined through linear interpolation of BM and TAPM outputs based on real-time latency. This design addresses both constant and varying latency, improving driving trajectories and steering control. Experimental results validate the efficacy of PLM-Net across various latency conditions. Source code: https://github.com/AwsKhalil/oscar/tree/devel-plm-net.
ROMar 2, 2025Code
CARIL: Confidence-Aware Regression in Imitation Learning for Autonomous DrivingElahe Delavari, Aws Khalil, Jaerock Kwon
End-to-end vision-based imitation learning has demonstrated promising results in autonomous driving by learning control commands directly from expert demonstrations. However, traditional approaches rely on either regressionbased models, which provide precise control but lack confidence estimation, or classification-based models, which offer confidence scores but suffer from reduced precision due to discretization. This limitation makes it challenging to quantify the reliability of predicted actions and apply corrections when necessary. In this work, we introduce a dual-head neural network architecture that integrates both regression and classification heads to improve decision reliability in imitation learning. The regression head predicts continuous driving actions, while the classification head estimates confidence, enabling a correction mechanism that adjusts actions in low-confidence scenarios, enhancing driving stability. We evaluate our approach in a closed-loop setting within the CARLA simulator, demonstrating its ability to detect uncertain actions, estimate confidence, and apply real-time corrections. Experimental results show that our method reduces lane deviation and improves trajectory accuracy by up to 50%, outperforming conventional regression-only models. These findings highlight the potential of classification-guided confidence estimation in enhancing the robustness of vision-based imitation learning for autonomous driving. The source code is available at https://github.com/ElaheDlv/Confidence_Aware_IL.
1.8ROMay 10
Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video ModelsAws Khalil, Jaerock Kwon
Teleoperation systems are fundamentally limited by communication latency, which degrades situational awareness and control performance. Predictive display aims to mitigate this limitation by presenting an estimate of the current visual state rather than delayed observations. While recent advances in generative video models enable high-quality video synthesis, their suitability for latency-sensitive predictive display remains unclear. This paper presents a zero-shot benchmark of off-the-shelf generative video models for short-horizon predictive display, without task-specific fine-tuning. We formulate the problem as rollout-based future frame prediction and develop a unified benchmarking pipeline using simulated driving data from the CARLA simulator. Five publicly released video models spanning transformer-based and diffusion-based families are evaluated across two resolutions and two conditioning regimes (multi-frame and single-frame). Performance is assessed using prediction accuracy (mean absolute difference), per-rollout latency, peak GPU memory usage, and temporal error evolution across the prediction horizon. On this zero-shot benchmark, no tested model simultaneously achieves low rollout error, non-divergent per-step error behavior, and real-time inference at the source frame rate. Increasing model scale or resolution yields limited and, in some cases, inverted improvements. These findings highlight a gap between general-purpose generative video synthesis and the requirements of predictive display in teleoperation, suggesting that practical deployment will require either explicit short-horizon temporal supervision, in-domain adaptation, or aggressive inference optimization rather than direct application of off-the-shelf models. Code, configurations, and qualitative results are released on the project page: https://bimilab.github.io/paper-GenPD
ROJul 10, 2024
NDST: Neural Driving Style Transfer for Human-Like Vision-Based Autonomous DrivingDonghyun Kim, Aws Khalil, Haewoon Nam et al.
Autonomous Vehicles (AV) and Advanced Driver Assistant Systems (ADAS) prioritize safety over comfort. The intertwining factors of safety and comfort emerge as pivotal elements in ensuring the effectiveness of Autonomous Driving (AD). Users often experience discomfort when AV or ADAS drive the vehicle on their behalf. Providing a personalized human-like AD experience, tailored to match users' unique driving styles while adhering to safety prerequisites, presents a significant opportunity to boost the acceptance of AVs. This paper proposes a novel approach, Neural Driving Style Transfer (NDST), inspired by Neural Style Transfer (NST), to address this issue. NDST integrates a Personalized Block (PB) into the conventional Baseline Driving Model (BDM), allowing for the transfer of a user's unique driving style while adhering to safety parameters. The PB serves as a self-configuring system, learning and adapting to an individual's driving behavior without requiring modifications to the BDM. This approach enables the personalization of AV models, aligning the driving style more closely with user preferences while ensuring baseline safety critical actuation. Two contrasting driving styles (Style A and Style B) were used to validate the proposed NDST methodology, demonstrating its efficacy in transferring personal driving styles to the AV system. Our work highlights the potential of NDST to enhance user comfort in AVs by providing a personalized and familiar driving experience. The findings affirm the feasibility of integrating NDST into existing AV frameworks to bridge the gap between safety and individualized driving styles, promoting wider acceptance and improved user experiences.