CVMar 26
Deep Learning Aided Vision System for Planetary RoversLomash Relia, Jai G Singla, Amitabh et al.
This study presents a vision system for planetary rovers, combining real-time perception with offline terrain reconstruction. The real-time module integrates CLAHE enhanced stereo imagery, YOLOv11n based object detection, and a neural network to estimate object distances. The offline module uses the Depth Anything V2 metric monocular depth estimation model to generate depth maps from captured images, which are fused into dense point clouds using Open3D. Real world distance estimates from the real time pipeline provide reliable metric context alongside the qualitative reconstructions. Evaluation on Chandrayaan 3 NavCam stereo imagery, benchmarked against a CAHV based utility, shows that the neural network achieves a median depth error of 2.26 cm within a 1 to 10 meter range. The object detection model maintains a balanced precision recall tradeoff on grayscale lunar scenes. This architecture offers a scalable, compute-efficient vision solution for autonomous planetary exploration.
CVApr 24
Depth-Aware Rover: A Study of Edge AI and Monocular Vision for Real-World ImplementationLomash Relia, Jai G Singla, Amitabh et al.
This study analyses simulated and real-world implementations of depth-aware rover navigation, highlighting the transition from stereo vision to monocular depth estimation using edge AI. A Unity-based lunar terrain simulator with stereo cameras and OpenCV's StereoSGBM was used to generate disparity maps. A physical rover built on Raspberry Pi 4 employed UniDepthV2 for monocular metric depth estimation and YOLO12n for real-time object detection. While stereo vision yielded higher accuracy in simulation, the monocular approach proved more robust and cost-effective in real-world deployment, achieving 0.1 FPS for depth and 10 FPS for detection.
CVApr 22
LunarDepthNet: Generation of Digital Elevation Models using Deep Learning and Monocular Satellite ImagesAaranay Aadi, Jai Gopal Singla, Amitabh et al.
Recent times have seen an increase in demand of high quality Digital Elevation Models (DEMs) for the lunar surface, because they are highly important for studying the moon and planning future missions. However, there is an evident lack of detailed elevation data on the Moon. To overcome this limitation, this study proposes a novel deep learning method that estimates and generates a surface elevation map directly from monocular images of the surface. The dataset used comprises of the Chandrayaan-2 Terrain Mapping Camera (TMC) images with their corresponding Digital Terrain Models (DTMs). The study proposes LunarDepthNet, which comprises of a UNet architecture to generate DEMS. It incorporates an EfficientNet encoder and custom layers to correctly learn how the light shadows on the surface relate to the actual elevation values. A combined loss function was also utilized to keep the terrain details accurate and smooth. During validation, the model showed a stable loss convergence of 12%. It achieved a mean nRMSE of 0.437 and an MAE of 4.5m in the testing stage. These results prove the model can generate dependable elevation maps from single orbital images, which are quite useful in regions of the moon where stereo-images are not available.
CVSep 5, 2025
Comparative Evaluation of Traditional and Deep Learning Feature Matching Algorithms using Chandrayaan-2 Lunar DataR. Makharia, J. G. Singla, Amitabh et al.
Accurate image registration is critical for lunar exploration, enabling surface mapping, resource localization, and mission planning. Aligning data from diverse lunar sensors -- optical (e.g., Orbital High Resolution Camera, Narrow and Wide Angle Cameras), hyperspectral (Imaging Infrared Spectrometer), and radar (e.g., Dual-Frequency Synthetic Aperture Radar, Selene/Kaguya mission) -- is challenging due to differences in resolution, illumination, and sensor distortion. We evaluate five feature matching algorithms: SIFT, ASIFT, AKAZE, RIFT2, and SuperGlue (a deep learning-based matcher), using cross-modality image pairs from equatorial and polar regions. A preprocessing pipeline is proposed, including georeferencing, resolution alignment, intensity normalization, and enhancements like adaptive histogram equalization, principal component analysis, and shadow correction. SuperGlue consistently yields the lowest root mean square error and fastest runtimes. Classical methods such as SIFT and AKAZE perform well near the equator but degrade under polar lighting. The results highlight the importance of preprocessing and learning-based approaches for robust lunar image registration across diverse conditions.