Kevin Tirta Wijaya

CV
h-index11
8papers
305citations
Novelty49%
AI Score42

8 Papers

CVJun 16, 2022Code
K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions

Dong-Hee Paek, Seung-Hyun Kong, Kevin Tirta Wijaya

Unlike RGB cameras that use visible light bands (384$\sim$769 THz) and Lidars that use infrared bands (361$\sim$331 THz), Radars use relatively longer wavelength radio bands (77$\sim$81 GHz), resulting in robust measurements in adverse weathers. Unfortunately, existing Radar datasets only contain a relatively small number of samples compared to the existing camera and Lidar datasets. This may hinder the development of sophisticated data-driven deep learning techniques for Radar-based perception. Moreover, most of the existing Radar datasets only provide 3D Radar tensor (3DRT) data that contain power measurements along the Doppler, range, and azimuth dimensions. As there is no elevation information, it is challenging to estimate the 3D bounding box of an object from 3DRT. In this work, we introduce KAIST-Radar (K-Radar), a novel large-scale object detection dataset and benchmark that contains 35K frames of 4D Radar tensor (4DRT) data with power measurements along the Doppler, range, azimuth, and elevation dimensions, together with carefully annotated 3D bounding box labels of objects on the roads. K-Radar includes challenging driving conditions such as adverse weathers (fog, rain, and snow) on various road structures (urban, suburban roads, alleyways, and highways). In addition to the 4DRT, we provide auxiliary measurements from carefully calibrated high-resolution Lidars, surround stereo cameras, and RTK-GPS. We also provide 4DRT-based object detection baseline neural networks (baseline NNs) and show that the height information is crucial for 3D object detection. And by comparing the baseline NN with a similarly-structured Lidar-based neural network, we demonstrate that 4D Radar is a more robust sensor for adverse weather conditions. All codes are available at https://github.com/kaist-avelab/k-radar.

CVMay 20, 2022Code
Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learnable Pooling

Kevin Tirta Wijaya, Dong-Hee Paek, Seung-Hyun Kong

Existing point cloud feature learning networks often incorporate sequences of sampling, neighborhood grouping, neighborhood-wise feature learning, and feature aggregation to learn high-semantic point features that represent the global context of a point cloud. Unfortunately, the compounded loss of information concerning granularity and non-maximum point features due to sampling and max pooling could adversely affect the high-semantic point features from existing networks such that they are insufficient to represent the local context of a point cloud, which in turn may hinder the network in distinguishing fine shapes. To cope with this problem, we propose a novel point cloud feature learning network, PointStack, using multi-resolution feature learning and learnable pooling (LP). The multi-resolution feature learning is realized by aggregating point features of various resolutions in the multiple layers, so that the final point features contain both high-semantic and high-resolution information. On the other hand, the LP is used as a generalized pooling function that calculates the weighted sum of multi-resolution point features through the attention mechanism with learnable queries, in order to extract all possible information from all available point features. Consequently, PointStack is capable of extracting high-semantic point features with minimal loss of information concerning granularity and non-maximum point features. Therefore, the final aggregated point features can effectively represent both global and local contexts of a point cloud. In addition, both the global structure and the local shape details of a point cloud can be well comprehended by the network head, which enables PointStack to advance the state-of-the-art of feature learning on point clouds. The codes are available at https://github.com/kaist-avelab/PointStack.

CVMar 11, 2023Code
Enhanced K-Radar: Optimal Density Reduction to Improve Detection Performance and Accessibility of 4D Radar Tensor-based Object Detection

Dong-Hee Paek, Seung-Hyun Kong, Kevin Tirta Wijaya

Recent works have shown the superior robustness of four-dimensional (4D) Radar-based three-dimensional (3D) object detection in adverse weather conditions. However, processing 4D Radar data remains a challenge due to the large data size, which require substantial amount of memory for computing and storage. In previous work, an online density reduction is performed on the 4D Radar Tensor (4DRT) to reduce the data size, in which the density reduction level is chosen arbitrarily. However, the impact of density reduction on the detection performance and memory consumption remains largely unknown. In this paper, we aim to address this issue by conducting extensive hyperparamter tuning on the density reduction level. Experimental results show that increasing the density level from 0.01% to 50% of the original 4DRT density level proportionally improves the detection performance, at a cost of memory consumption. However, when the density level is increased beyond 5%, only the memory consumption increases, while the detection performance oscillates below the peak point. In addition to the optimized density hyperparameter, we also introduce 4D Sparse Radar Tensor (4DSRT), a new representation for 4D Radar data with offline density reduction, leading to a significantly reduced raw data size. An optimized development kit for training the neural networks is also provided, which along with the utilization of 4DSRT, improves training speed by a factor of 17.1 compared to the state-of-the-art 4DRT-based neural networks. All codes are available at: https://github.com/kaist-avelab/K-Radar.

CVOct 17, 2022
Row-wise LiDAR Lane Detection Network with Lane Correlation Refinement

Dong-Hee Paek, Kevin Tirta Wijaya, Seung-Hyun Kong

Lane detection is one of the most important functions for autonomous driving. In recent years, deep learning-based lane detection networks with RGB camera images have shown promising performance. However, camera-based methods are inherently vulnerable to adverse lighting conditions such as poor or dazzling lighting. Unlike camera, LiDAR sensor is robust to the lighting conditions. In this work, we propose a novel two-stage LiDAR lane detection network with row-wise detection approach. The first-stage network produces lane proposals through a global feature correlator backbone and a row-wise detection head. Meanwhile, the second-stage network refines the feature map of the first-stage network via attention-based mechanism between the local features around the lane proposals, and outputs a set of new lane proposals. Experimental results on the K-Lane dataset show that the proposed network advances the state-of-the-art in terms of F1-score with 30% less GFLOPs. In addition, the second-stage network is found to be especially robust to lane occlusions, thus, demonstrating the robustness of the proposed network for driving in crowded environments.

CVOct 21, 2021Code
K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways

Donghee Paek, Seung-Hyun Kong, Kevin Tirta Wijaya

Lane detection is a critical function for autonomous driving. With the recent development of deep learning and the publication of camera lane datasets and benchmarks, camera lane detection networks (CLDNs) have been remarkably developed. Unfortunately, CLDNs rely on camera images which are often distorted near the vanishing line and prone to poor lighting condition. This is in contrast with Lidar lane detection networks (LLDNs), which can directly extract the lane lines on the bird's eye view (BEV) for motion planning and operate robustly under various lighting conditions. However, LLDNs have not been actively studied, mostly due to the absence of large public lidar lane datasets. In this paper, we introduce KAIST-Lane (K-Lane), the world's first and the largest public urban road and highway lane dataset for Lidar. K-Lane has more than 15K frames and contains annotations of up to six lanes under various road and traffic conditions, e.g., occluded roads of multiple occlusion levels, roads at day and night times, merging (converging and diverging) and curved lanes. We also provide baseline networks we term Lidar lane detection networks utilizing global feature correlator (LLDN-GFC). LLDN-GFC exploits the spatial characteristics of lane lines on the point cloud, which are sparse, thin, and stretched along the entire ground plane of the point cloud. From experimental results, LLDN-GFC achieves the state-of-the-art performance with an F1- score of 82.1%, on the K-Lane. Moreover, LLDN-GFC shows strong performance under various lighting conditions, which is unlike CLDNs, and also robust even in the case of severe occlusions, unlike LLDNs using the conventional CNN. The K-Lane, LLDN-GFC training code, pre-trained models, and complete development kits including evaluation, visualization and annotation tools are available at https://github.com/kaist-avelab/k-lane.

LGAug 22, 2025
Post Hoc Regression Refinement via Pairwise Rankings

Kevin Tirta Wijaya, Michael Sun, Minghao Guo et al.

Accurate prediction of continuous properties is essential to many scientific and engineering tasks. Although deep-learning regressors excel with abundant labels, their accuracy deteriorates in data-scarce regimes. We introduce RankRefine, a model-agnostic, plug-and-play post hoc method that refines regression with expert knowledge coming from pairwise rankings. Given a query item and a small reference set with known properties, RankRefine combines the base regressor's output with a rank-based estimate via inverse variance weighting, requiring no retraining. In molecular property prediction task, RankRefine achieves up to 10% relative reduction in mean absolute error using only 20 pairwise comparisons obtained through a general-purpose large language model (LLM) with no finetuning. As rankings provided by human experts or general-purpose LLMs are sufficient for improving regression across diverse domains, RankRefine offers practicality and broad applicability, especially in low-data settings.

LGNov 5, 2024
Two-Stage Pretraining for Molecular Property Prediction in the Wild

Kevin Tirta Wijaya, Minghao Guo, Michael Sun et al.

Molecular deep learning models have achieved remarkable success in property prediction, but they often require large amounts of labeled data. The challenge is that, in real-world applications, labels are extremely scarce, as obtaining them through laboratory experimentation is both expensive and time-consuming. In this work, we introduce MoleVers, a versatile pretrained molecular model designed for various types of molecular property prediction in the wild, i.e., where experimentally-validated labels are scarce. MoleVers employs a two-stage pretraining strategy. In the first stage, it learns molecular representations from unlabeled data through masked atom prediction and extreme denoising, a novel task enabled by our newly introduced branching encoder architecture and dynamic noise scale sampling. In the second stage, the model refines these representations through predictions of auxiliary properties derived from computational methods, such as the density functional theory or large language models. Evaluation on 22 small, experimentally-validated datasets demonstrates that MoleVers achieves state-of-the-art performance, highlighting the effectiveness of its two-stage framework in producing generalizable molecular representations for diverse downstream properties.

CHEM-PHFeb 26, 2024
TrustMol: Trustworthy Inverse Molecular Design via Alignment with Molecular Dynamics

Kevin Tirta Wijaya, Navid Ansari, Hans-Peter Seidel et al.

Data-driven generation of molecules with desired properties, also known as inverse molecular design (IMD), has attracted significant attention in recent years. Despite the significant progress in the accuracy and diversity of solutions, existing IMD methods lag behind in terms of trustworthiness. The root issue is that the design process of these methods is increasingly more implicit and indirect, and this process is also isolated from the native forward process (NFP), the ground-truth function that models the molecular dynamics. Following this insight, we propose TrustMol, an IMD method built to be trustworthy. For this purpose, TrustMol relies on a set of technical novelties including a new variational autoencoder network. Moreover, we propose a latent-property pairs acquisition method to effectively navigate the complexities of molecular latent optimization, a process that seems intuitive yet challenging due to the high-frequency and discontinuous nature of molecule space. TrustMol also integrates uncertainty-awareness into molecular latent optimization. These lead to improvements in both explainability and reliability of the IMD process. We validate the trustworthiness of TrustMol through a wide range of experiments.