CVOct 9, 2022Code
Unsupervised RGB-to-Thermal Domain Adaptation via Multi-Domain Attention NetworkLu Gan, Connor Lee, Soon-Jo Chung
This work presents a new method for unsupervised thermal image classification and semantic segmentation by transferring knowledge from the RGB domain using a multi-domain attention network. Our method does not require any thermal annotations or co-registered RGB-thermal pairs, enabling robots to perform visual tasks at night and in adverse weather conditions without incurring additional costs of data labeling and registration. Current unsupervised domain adaptation methods look to align global images or features across domains. However, when the domain shift is significantly larger for cross-modal data, not all features can be transferred. We solve this problem by using a shared backbone network that promotes generalization, and domain-specific attention that reduces negative transfer by attending to domain-invariant and easily-transferable features. Our approach outperforms the state-of-the-art RGB-to-thermal adaptation method in classification benchmarks, and is successfully applied to thermal river scene segmentation using only synthetic RGB images. Our code is made publicly available at https://github.com/ganlumomo/thermal-uda-attention.
CVJul 18, 2023Code
Online Self-Supervised Thermal Water Segmentation for Aerial VehiclesConnor Lee, Jonathan Gustafsson Frennert, Lu Gan et al.
We present a new method to adapt an RGB-trained water segmentation network to target-domain aerial thermal imagery using online self-supervision by leveraging texture and motion cues as supervisory signals. This new thermal capability enables current autonomous aerial robots operating in near-shore environments to perform tasks such as visual navigation, bathymetry, and flow tracking at night. Our method overcomes the problem of scarce and difficult-to-obtain near-shore thermal data that prevents the application of conventional supervised and unsupervised methods. In this work, we curate the first aerial thermal near-shore dataset, show that our approach outperforms fully-supervised segmentation models trained on limited target-domain thermal data, and demonstrate real-time capabilities onboard an Nvidia Jetson embedded computing platform. Code and datasets used in this work will be available at: https://github.com/connorlee77/uav-thermal-water-segmentation.
CVOct 30, 2023Code
RGB-X Object Detection via Scene-Specific Fusion ModulesSri Aditya Deevi, Connor Lee, Lu Gan et al.
Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.
CVMar 13, 2024Code
Caltech Aerial RGB-Thermal Dataset in the WildConnor Lee, Matthew Anderson, Nikhil Raganathan et al.
We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, thermal, global positioning, and inertial data. We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings in order to drive the development of perception algorithms robust to adverse weather and nighttime conditions. Using this dataset, we propose new and challenging benchmarks for thermal and RGB-thermal (RGB-T) semantic segmentation, RGB-T image translation, and motion tracking. We present extensive results using state-of-the-art methods and highlight the challenges posed by temporal and geographical domain shifts in our data. The dataset and accompanying code is available at https://github.com/aerorobotics/caltech-aerial-rgbt-dataset.
CVMar 21, 2024Code
Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field RobotsConnor Lee, Saraswati Soedarmadji, Matthew Anderson et al.
We present a new method to automatically generate semantic segmentation annotations for thermal imagery captured from an aerial vehicle by utilizing satellite-derived data products alongside onboard global positioning and attitude estimates. This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale. By incorporating a thermal-conditioned refinement step with visual foundation models, our approach can produce highly-precise semantic segmentation labels using low-resolution satellite land cover data for little-to-no cost. It achieves 98.5% of the performance from using costly high-resolution options and demonstrates between 70-160% improvement over popular zero-shot semantic segmentation methods based on large vision-language models currently used for generating annotations for RGB imagery. Code will be available at: https://github.com/connorlee77/aerial-auto-segment.
CVAug 22, 2024
Vision-Based Detection of Uncooperative Targets and Components on Small SatellitesHannah Grauer, Elena-Sorina Lupu, Connor Lee et al.
Space debris and inactive satellites pose a threat to the safety and integrity of operational spacecraft and motivate the need for space situational awareness techniques. These uncooperative targets create a challenging tracking and detection problem due to a lack of prior knowledge of their features, trajectories, or even existence. Recent advancements in computer vision models can be used to improve upon existing methods for tracking such uncooperative targets to make them more robust and reliable to the wide-ranging nature of the target. This paper introduces an autonomous detection model designed to identify and monitor these objects using learning and computer vision. The autonomous detection method aims to identify and accurately track the uncooperative targets in varied circumstances, including different camera spectral sensitivities, lighting, and backgrounds. Our method adapts to the relative distance between the observing spacecraft and the target, and different detection strategies are adjusted based on distance. At larger distances, we utilize You Only Look Once (YOLOv8), a multitask Convolutional Neural Network (CNN), for zero-shot and domain-specific single-shot real time detection of the target. At shorter distances, we use knowledge distillation to combine visual foundation models with a lightweight fast segmentation CNN (Fast-SCNN) to segment the spacecraft components with low storage requirements and fast inference times, and to enable weight updates from earth and possible onboard training. Lastly, we test our method on a custom dataset simulating the unique conditions encountered in space, as well as a publicly-available dataset.
LGJan 30
Stabilizing Transformer Training Through ConsensusShyam Venkatasubramanian, Sean Moushegian, Michael Lin et al.
Standard attention-based transformers are known to exhibit instability under learning rate overspecification during training, particularly at high learning rates. While various methods have been proposed to improve resilience to such overspecification by modifying the optimization procedure, fundamental architectural innovations to this end remain underexplored. In this work, we illustrate that the consensus mechanism, a drop-in replacement for attention, stabilizes transformer training across a wider effective range of learning rates. We formulate consensus as a graphical model and provide extensive empirical analysis demonstrating improved stability across learning rate sweeps on text, DNA, and protein modalities. We further propose a hybrid consensus-attention framework that preserves performance while improving stability. We provide theoretical analysis characterizing the properties of consensus.
CVApr 21, 2025
MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware DistillationXingxing Zuo, Nikhil Ranganathan, Connor Lee et al.
Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth supervision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88\% compared to the baseline without distillation.