CVFeb 8, 2023
Domain Adaptation of Synthetic Driving Datasets for Real-World Autonomous DrivingKoustav Mullick, Harshil Jain, Sanchit Gupta et al.
While developing perception based deep learning models, the benefit of synthetic data is enormous. However, performance of networks trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data due to the domain gap between them. One of the popular solutions in bridging this gap between synthetic and actual world data is to frame it as a domain adaptation task. In this paper, we propose and evaluate novel ways for the betterment of such approaches. In particular we build upon the method of UNIT-GAN. In normal GAN training for the task of domain translation, pairing of images from both the domains (viz, real and synthetic) is done randomly. We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model along with improving the visual quality of such transformed images. We illustrate our empirical findings on Cityscapes \cite{cityscapes} and challenging synthetic dataset Synscapes. Though the findings are reported on the base network of UNIT-GAN, they can be easily extended to any other similar network.
CVDec 5, 2025
BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous DrivingKarthik Mohan, Sonam Singh, Amit Arvind Kale
The rapid development of Vision-Language models (VLMs) and Multimodal Language Models (MLLMs) in autonomous driving research has significantly reshaped the landscape by enabling richer scene understanding, context-aware reasoning, and more interpretable decision-making. However, a lot of existing work often relies on either single-view encoders that fail to exploit the spatial structure of multi-camera systems or operate on aggregated multi-view features, which lack a unified spatial representation, making it more challenging to reason about ego-centric directions, object relations, and the wider context. We thus present BeLLA, an end-to-end architecture that connects unified 360° BEV representations with a large language model for question answering in autonomous driving. We primarily evaluate our work using two benchmarks - NuScenes-QA and DriveLM, where BeLLA consistently outperforms existing approaches on questions that require greater spatial reasoning, such as those involving relative object positioning and behavioral understanding of nearby objects, achieving up to +9.3% absolute improvement in certain tasks. In other categories, BeLLA performs competitively, demonstrating the capability of handling a diverse range of questions.
CVNov 16, 2024
Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic SegmentationJaisidh Singh, Sonam Singh, Amit Arvind Kale et al.
This paper presents a novel method for discovering systematic errors in segmentation models. For instance, a systematic error in the segmentation model can be a sufficiently large number of misclassifications from the model as a parking meter for a target class of pedestrians. With the rapid deployment of these models in critical applications such as autonomous driving, it is vital to detect and interpret these systematic errors. However, the key challenge is automatically discovering such failures on unlabelled data and forming interpretable semantic sub-groups for intervention. For this, we leverage multimodal foundation models to retrieve errors and use conceptual linkage along with erroneous nature to study the systematic nature of these errors. We demonstrate that such errors are present in SOTA segmentation models (UperNet ConvNeXt and UperNet Swin) trained on the Berkeley Deep Drive and benchmark the approach qualitatively and quantitatively, showing its effectiveness by discovering coherent systematic errors for these models. Our work opens up the avenue to model analysis and intervention that have so far been underexplored in semantic segmentation.