CVAug 21, 2023
LightDepth: Single-View Depth Self-Supervision from Illumination DeclineJavier Rodríguez-Puigvert, Víctor M. Batlle, J. M. M. Montiel et al.
Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data.
CVMar 18, 2025
3D Densification for Multi-Map Monocular VSLAM in EndoscopyX. Anadón, Javier Rodríguez-Puigvert, J. M. M. Montiel
Multi-map Sparse Monocular visual Simultaneous Localization and Mapping applied to monocular endoscopic sequences has proven efficient to robustly recover tracking after the frequent losses in endoscopy due to motion blur, temporal occlusion, tools interaction or water jets. The sparse multi-maps are adequate for robust camera localization, however they are very poor for environment representation, they are noisy, with a high percentage of inaccurately reconstructed 3D points, including significant outliers, and more importantly with an unacceptable low density for clinical applications. We propose a method to remove outliers and densify the maps of the state of the art for sparse endoscopy multi-map CudaSIFT-SLAM. The NN LightDepth for up-to-scale depth dense predictions are aligned with the sparse CudaSIFT submaps by means of the robust to spurious LMedS. Our system mitigates the inherent scale ambiguity in monocular depth estimation while filtering outliers, leading to reliable densified 3D maps. We provide experimental evidence of accurate densified maps 4.15 mm RMS accuracy at affordable computing time in the C3VD phantom colon dataset. We report qualitative results on the real colonoscopy from the Endomapper dataset.
CVDec 16, 2021
On the Uncertain Single-View Depths in ColonoscopiesJavier Rodríguez-Puigvert, David Recasens, Javier Civera et al.
Estimating depth information from endoscopic images is a prerequisite for a wide set of AI-assisted technologies, such as accurate localization and measurement of tumors, or identification of non-inspected areas. As the domain specificity of colonoscopies -- deformable low-texture environments with fluids, poor lighting conditions and abrupt sensor motions -- pose challenges to multi-view 3D reconstructions, single-view depth learning stands out as a promising line of research. Depth learning can be extended in a Bayesian setting, which enables continual learning, improves decision making and can be used to compute confidence intervals or quantify uncertainty for in-body measurements. In this paper, we explore for the first time Bayesian deep networks for single-view depth estimation in colonoscopies. Our specific contribution is two-fold: 1) an exhaustive analysis of scalable Bayesian networks for depth learning in different datasets, highlighting challenges and conclusions regarding synthetic-to-real domain changes and supervised vs. self-supervised methods; and 2) a novel teacher-student approach to deep depth learning that takes into account the teacher uncertainty.
CVApr 29, 2021
Bayesian Deep Neural Networks for Supervised Learning of Single-View DepthJavier Rodríguez-Puigvert, Rubén Martínez-Cantín, Javier Civera
Uncertainty quantification is essential for robotic perception, as overconfident or point estimators can lead to collisions and damages to the environment and the robot. In this paper, we evaluate scalable approaches to uncertainty quantification in single-view supervised depth learning, specifically MC dropout and deep ensembles. For MC dropout, in particular, we explore the effect of the dropout at different levels in the architecture. We show that adding dropout in all layers of the encoder brings better results than other variations found in the literature. This configuration performs similarly to deep ensembles with a much lower memory footprint, which is relevant forapplications. Finally, we explore the use of depth uncertainty for pseudo-RGBD ICP and demonstrate its potential to estimate accurate two-view relative motion with the real scale.