CVMay 24, 2024Code
Fieldscale: Locality-Aware Field-based Adaptive Rescaling for Thermal Infrared ImageHyeonjae Gil, Myung-Hwan Jeon, Ayoung Kim
Thermal infrared (TIR) cameras are emerging as promising sensors in safety-related fields due to their robustness against external illumination. However, RAW TIR image has 14 bits of pixel depth and needs to be rescaled into 8 bits for general applications. Previous works utilize a global 1D look-up table to compute pixel-wise gain solely based on its intensity, which degrades image quality by failing to consider the local nature of the heat. We propose Fieldscale, a rescaling based on locality-aware 2D fields where both the intensity value and spatial context of each pixel within an image are embedded. It can adaptively determine the pixel gain for each region and produce spatially consistent 8-bit rescaled images with minimal information loss and high visibility. Consistent performance improvement on image quality assessment and two other downstream tasks support the effectiveness and usability of Fieldscale. All the codes are publicly opened to facilitate research advancements in this field. https://github.com/hyeonjaegil/fieldscale
ROMay 13
LEXI-SG: Monocular 3D Scene Graph Mapping with Room-Guided Feed-Forward ReconstructionChristina Kassab, Hyeonjae Gil, Matías Mattamala et al.
Scene graphs are becoming a standard representation for robot navigation, providing hierarchical geometric and semantic scene understanding. However, most scene graph mapping methods rely on depth cameras or LiDAR sensors. In this work, we present LEXI-SG, the first dense monocular visual mapping system for open-vocabulary 3D scene graphs using only RGB camera input. Our approach exploits the semantic priors of open-vocabulary foundation models to partition the scene into rooms, deferring feed-forward reconstruction to when each room is fully observed -- enabling scalable dense mapping without sliding-window scale inconsistencies. We propose a room-based factor graph formulation to globally align room reconstructions while preserving local map consistency and naturally imposing the semantic scene graph hierarchy. Within each room, we further support open-vocabulary object segmentation and tracking. We validate LEXI-SG on indoor scenes from the Habitat-Matterport 3D and self-collected egocentric office sequences. We evaluate its performance against existing feed-forward SLAM methods, as well as established scene graphs baselines. We demonstrate improved trajectory estimation and dense reconstruction, as well as, competitive performance in open-vocabulary segmentation. LEXI-SG shows that accurate, scalable, open-vocabulary 3D scene graphs can be achieved from monocular RGB alone. Our project page and office sequences are available here: https://ori-drs.github.io/lexisg-web/.
CVJul 24, 2025Code
Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann ManifoldJaeho Shin, Hyeonjae Gil, Junwoo Jang et al.
Affine Grassmannian has been favored for expressing proximity between lines and planes due to its theoretical exactness in measuring distances among features. Despite this advantage, the existing method can only measure the proximity without yielding the distance as an explicit function of rigid body transformation. Thus, an optimizable distance function on the manifold has remained underdeveloped, stifling its application in registration problems. This paper is the first to explicitly derive an optimizable cost function between two Grassmannian features with respect to rigid body transformation ($\mathbf{R}$ and $\mathbf{t}$). Specifically, we present a rigorous mathematical proof demonstrating that the bases of high-dimensional linear subspaces can serve as an explicit representation of the cost. Finally, we propose an optimizable cost function based on the transformed bases that can be applied to the registration problem of any affine subspace. Compared to vector parameter-based approaches, our method is able to find a globally optimal solution by directly minimizing the geodesic distance which is agnostic to representation ambiguity. The resulting cost function and its extension to the inlier-set maximizing Branch-and-Bound (BnB) solver have been demonstrated to improve the convergence of existing solutions or outperform them in various computer vision tasks. The code is available on https://github.com/joomeok/GrassmannRegistration.
CVAug 4, 2020
MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural NetworksHyungtae Lim, Hyeonjae Gil, Hyun Myung
In this study, a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN) is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera. Our proposed network consists of a multi-stage encoder-decoder architecture and Cross Stage Feature Aggregation (CSFA). The proposed multi-stage encoder-decoder architecture alleviates the partial observation problem caused by the characteristics of a 2D LiDAR, and CSFA prevents the multi-stage network from diluting the features and allows the network to learn the inter-spatial relationship between features better. Previous works use sub-sampled data from the ground truth as an input rather than actual 2D LiDAR data. In contrast, our approach trains the model and conducts experiments with a physically-collected 2D LiDAR dataset. To this end, we acquired our own dataset called KAIST RGBD-scan dataset and validated the effectiveness and the robustness of MSDPN under realistic conditions. As verified experimentally, our network yields promising performance against state-of-the-art methods. Additionally, we analyzed the performance of different input methods and confirmed that the reference depth map is robust in untrained scenarios.