Diffusion-Based Image Augmentation for Semantic Segmentation in Outdoor Robotics
This addresses the challenge of visual scene variability for outdoor robotics, but it is incremental as it builds on existing diffusion and segmentation models.
The paper tackles the problem of poor performance of learning-based perception algorithms in out-of-distribution environments, such as snow-filled settings for autonomous vehicles, by proposing a diffusion-based image augmentation method to better represent deployment conditions in training data, resulting in improved model fine-tuning.
The performance of leaning-based perception algorithms suffer when deployed in out-of-distribution and underrepresented environments. Outdoor robots are particularly susceptible to rapid changes in visual scene appearance due to dynamic lighting, seasonality and weather effects that lead to scenes underrepresented in the training data of the learning-based perception system. In this conceptual paper, we focus on preparing our autonomous vehicle for deployment in snow-filled environments. We propose a novel method for diffusion-based image augmentation to more closely represent the deployment environment in our training data. Diffusion-based image augmentations rely on the public availability of vision foundation models learned on internet-scale datasets. The diffusion-based image augmentations allow us to take control over the semantic distribution of the ground surfaces in the training data and to fine-tune our model for its deployment environment. We employ open vocabulary semantic segmentation models to filter out augmentation candidates that contain hallucinations. We believe that diffusion-based image augmentations can be extended to many other environments apart from snow surfaces, like sandy environments and volcanic terrains.