Soomok Lee

h-index8

7papers

306citations

Novelty53%

AI Score41

Ranked #67,547 of 194,257 authors (top 35%)#23,047 in CV (top 39%)

7 Papers

12.5CVApr 30

Sparse-View 3D Gaussian Splatting in the Wild

Wongi Park, Jordan A. James, Myeongseok Nam et al.

We propose a 3D novel sparse-view synthesis framework for unconstrained real-world scenarios that contain distractors. Unlike existing methods that primarily perform novel-view synthesis from a sparse set of constrained images without transient elements or leverage unconstrained dense image collections to enhance 3D representation in real-world scenarios, our method not only effectively tackles sparse unconstrained image collections, but also shows high-quality 3D rendering results. To do this, we introduce reference-guided view refinement with a diffusion model using a transient mask and a reference image to enhance the 3D representation and mitigate artifacts in rendered views. Furthermore, we address sparse regions in the Gaussian field via pseudo-view generation along with a sparsity-aware Gaussian replication strategy to amplify Gaussians in the sparse regions. Extensive experiments on publicly available datasets demonstrate that our methodology consistently outperforms existing methods (e.g., PSNR - 17.2%, SSIM - 10.8%, LPIPS - 4.0%) and provides high-fidelity 3D rendering results. This advancement paves the way for realizing unconstrained real-world scenarios without labor-intensive data acquisition. Our project page is available at $\href{https://robotic-vision-lab.github.io/SaveWildGS/}{here}$

3.6CVJan 27, 2025Code

Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles

Younggun Kim, Mohamed Abdel-Aty, Beomsik Cho et al.

Point cloud representation has recently become a research hotspot in the field of computer vision and has been utilized for autonomous vehicles. However, adapting deep learning networks for point cloud data recognition is challenging due to the variability in datasets and sensor technologies. This variability underscores the necessity for adaptive techniques to maintain accuracy under different conditions. In this paper, we present the Multi-View Structural Convolution Network (MSCN) designed for domain-invariant point cloud recognition. MSCN comprises Structural Convolution Layers (SCL) that extract local context geometric features from point clouds and Structural Aggregation Layers (SAL) that extract and aggregate both local and overall context features from point clouds. Furthermore, MSCN enhances feature robustness by training with unseen domain point clouds generated from the source domain, enabling the model to acquire domain-invariant representations. Extensive cross-domain experiments demonstrate that MSCN achieves an average accuracy of 82.0%, surpassing the strong baseline PointTransformer by 15.8%, confirming its effectiveness under real-world domain shifts. Our code is available at https://github.com/MLMLab/MSCN.

2.0CVJul 5, 2024

3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition

Younggun Kim, Beomsik Cho, Seonghoon Ryoo et al.

Adapting deep learning networks for point cloud data recognition in self-driving vehicles faces challenges due to the variability in datasets and sensor technologies, emphasizing the need for adaptive techniques to maintain accuracy across different conditions. In this paper, we introduce the 3D Adaptive Structural Convolution Network (3D-ASCN), a cutting-edge framework for 3D point cloud recognition. It combines 3D convolution kernels, a structural tree structure, and adaptive neighborhood sampling for effective geometric feature extraction. This method obtains domain-invariant features and demonstrates robust, adaptable performance on a variety of point cloud datasets, ensuring compatibility across diverse sensor configurations without the need for parameter adjustments. This highlights its potential to significantly enhance the reliability and efficiency of self-driving vehicle technology.

20.3CVJun 28

Rectifying Mask via Entropy for Distractor-Free 3DGS in Ambiguous Scenarios

Wongi Park, Jiyeon Lim, Minjae Lee et al.

We present RefineSplat, a systematic framework that effectively constructs transient masks to identify diverse ambiguous distractors. To do this, we qualitatively and quantitatively analyze issues and propose a novel entropy-aware adaptive masking method. Unlike existing approaches that struggle to distinguish transient elements from static scenes due to color or semantic ambiguity, RefineSplat captures ambiguous distractors leveraging entropy and instance masks. Furthermore, we propose a simple yet effective entropy-aware density control to align Gaussians in ambiguous scenarios considering Entropy-aware positional gradients. Additionally, to rigorously validate our method, we first create and release the Ambiguous wild dataset, including 18 scenes where distractors and static scenes are hard to distinguish due to color or semantic resemblances. Experimental results on various datasets demonstrate that RefineSplat shows state-of-the-art performance, showing distractor-free novel view synthesis.

5.1IVMar 23, 2025

Efficient Deep Learning Approaches for Processing Ultra-Widefield Retinal Imaging

Siwon Kim, Wooyung Yun, Jeongbin Oh et al.

Deep learning has emerged as the predominant solution for classifying medical images. We intend to apply these developments to the ultra-widefield (UWF) retinal imaging dataset. Since UWF images can accurately diagnose various retina diseases, it is very important to clas sify them accurately and prevent them with early treatment. However, processing images manually is time-consuming and labor-intensive, and there are two challenges to automating this process. First, high perfor mance usually requires high computational resources. Artificial intelli gence medical technology is better suited for places with limited medical resources, but using high-performance processing units in such environ ments is challenging. Second, the problem of the accuracy of colour fun dus photography (CFP) methods. In general, the UWF method provides more information for retinal diagnosis than the CFP method, but most of the research has been conducted based on the CFP method. Thus, we demonstrate that these problems can be efficiently addressed in low performance units using methods such as strategic data augmentation and model ensembles, which balance performance and computational re sources while utilizing UWF images.

3.6CVAug 4, 2025

mmWave Radar-Based Non-Line-of-Sight Pedestrian Localization at T-Junctions Utilizing Road Layout Extraction via Camera

Byeonggyu Park, Hee-Yeun Kim, Byonghyok Choi et al.

Pedestrians Localization in Non-Line-of-Sight (NLoS) regions within urban environments poses a significant challenge for autonomous driving systems. While mmWave radar has demonstrated potential for detecting objects in such scenarios, the 2D radar point cloud (PCD) data is susceptible to distortions caused by multipath reflections, making accurate spatial inference difficult. Additionally, although camera images provide high-resolution visual information, they lack depth perception and cannot directly observe objects in NLoS regions. In this paper, we propose a novel framework that interprets radar PCD through road layout inferred from camera for localization of NLoS pedestrians. The proposed method leverages visual information from the camera to interpret 2D radar PCD, enabling spatial scene reconstruction. The effectiveness of the proposed approach is validated through experiments conducted using a radar-camera system mounted on a real vehicle. The localization performance is evaluated using a dataset collected in outdoor NLoS driving environments, demonstrating the practical applicability of the method.

6.2CVMay 25, 2025

Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis

Myeongseok Nam, Wongi Park, Minsol Kim et al.

Recently, 3D Gaussian Splatting (3D-GS) based on Thermal Infrared (TIR) imaging has gained attention in novel-view synthesis, showing real-time rendering. However, novel-view synthesis with thermal infrared images suffers from transmission effects, emissivity, and low resolution, leading to floaters and blur effects in rendered images. To address these problems, we introduce Veta-GS, which leverages a view-dependent deformation field and a Thermal Feature Extractor (TFE) to precisely capture subtle thermal variations and maintain robustness. Specifically, we design view-dependent deformation field that leverages camera position and viewing direction, which capture thermal variations. Furthermore, we introduce the Thermal Feature Extractor (TFE) and MonoSSIM loss, which consider appearance, edge, and frequency to maintain robustness. Extensive experiments on the TI-NSD benchmark show that our method achieves better performance over existing methods.