Changhyun Choi

h-index1

5papers

1citation

Novelty50%

AI Score39

Ranked #78,838 of 194,257 authors (top 41%)#2,361 in RO (top 35%)

5 Papers

4.7CVJul 8

Self-Supervised Pretraining Improves Cross-Site and Cross-Scale Robustness of Point Cloud Leaf-Wood Segmentation

Heeju Mun, Tackang Yang, Yunsoo Nam et al.

The accuracy of existing leaf-wood segmentation methods for tree point clouds varies across forest types and sites. Self-supervised learning (SSL) on point clouds has improved the generalization of deep learning models for forestry point cloud tasks, including biomass regression and individual tree segmentation, but its applicability to leaf-wood segmentation remains untested. In this study, we pretrained Point-M2AE, a widely used SSL architecture for point clouds, on ShapeNet-55 augmented with 2,400 individual tree point clouds. For fine-tuning and inference, we used recursive voxel subdivision to handle the wide variation in point density across inputs, allowing the same model to operate at both individual-tree and plot scales without architecture change. Compared to the model without pretraining, the pretrained model improved wood IoU from 60.5% to 70.0% for needleleaf and from 69.7% to 76.3% for broadleaf trees. On a benchmark spanning four countries across three climatic zones, the pretrained model achieved the smallest cross-site variation and highest overall performance among compared methods (LeWos, CWLS, and PointTransformer). Plot-level segmentation maintained accuracy comparable to individual-tree performance, with mIoU of 84.7% for broadleaf and 77.7% for needleleaf plots, showing that the model generalizes across scales without additional finetuning. As a downstream test in tropical forests, where dense canopies make segmentation challenging, we applied our model and a quantitative structure model to estimate wood volume for 28 trees from Guyana, Indonesia, and Peru to assess whether the segmentation improvements from SSL pretraining translate into improved downstream performance. The resulting volume estimates achieved the lowest error among all methods tested (MAE = 2.40 m$^3$), less than half that of algorithmic baselines (LeWos: 5.94 m$^3$; CWLS: 5.27 m$^3$).

5.6ROJun 25

LAMP: Lane-Aligned Motion Primitives for Feasible Trajectory Prediction

Sangjin Han, Hoseong Jung, Jeongtae Her et al.

Motion forecasting is essential for autonomous driving systems to enable safe decision-making and planning in complex driving scenarios. While existing predictors excel at minimizing standard displacement errors, they often overlook the adherence to lane topology of multimodal predictions, particularly for lower-probability modes. Consequently, predicted trajectories may violate physical and logical constraints, making the prediction set unreliable for safety-critical planning. In this paper, we propose LAMP (Lane-Aligned Motion Primitives), a topology-aware forecasting framework that anchors multimodal prediction to structured motion primitives aligned with lane topology. Specifically, we use a VQ-VAE to learn shape-aware motion primitives as discrete intention queries, capturing spatiotemporal patterns beyond endpoint-based intentions. We further introduce a feasibility-aware intention selector trained with a lane-topology prior for filtering unreachable intention queries, guiding the decoder to prioritize topology-consistent intentions while preserving behavioral diversity. Extensive experiments on the Argoverse 2 dataset demonstrate that LAMP achieves prediction accuracy comparable to state-of-the-art baselines while outperforming them in feasibility and diversity metrics.

12.3ROJan 4, 2025

Attribute-Based Robotic Grasping with Data-Efficient Adaptation

Yang Yang, Houjian Yu, Xibai Lou et al.

Robotic grasping is one of the most fundamental robotic manipulation tasks and has been the subject of extensive research. However, swiftly teaching a robot to grasp a novel target object in clutter remains challenging. This paper attempts to address the challenge by leveraging object attributes that facilitate recognition, grasping, and rapid adaptation to new domains. In this work, we present an end-to-end encoder-decoder network to learn attribute-based robotic grasping with data-efficient adaptation capability. We first pre-train the end-to-end model with a variety of basic objects to learn generic attribute representation for recognition and grasping. Our approach fuses the embeddings of a workspace image and a query text using a gated-attention mechanism and learns to predict instance grasping affordances. To train the joint embedding space of visual and textual attributes, the robot utilizes object persistence before and after grasping. Our model is self-supervised in a simulation that only uses basic objects of various colors and shapes but generalizes to novel objects in new environments. To further facilitate generalization, we propose two adaptation methods, adversarial adaption and one-grasp adaptation. Adversarial adaptation regulates the image encoder using augmented data of unlabeled images, whereas one-grasp adaptation updates the overall end-to-end model using augmented data from one grasp trial. Both adaptation methods are data-efficient and considerably improve instance grasping performance. Experimental results in both simulation and the real world demonstrate that our approach achieves over 81% instance grasping success rate on unknown objects, which outperforms several baselines by large margins.

2.2ROFeb 21

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Hoseong Jung, Sungil Son, Daesol Cho et al.

Autonomous robotic systems should reason about resource control and its impact on subsequent maneuvers, especially when operating with limited energy budgets or restricted sensing. Learning-based control is effective in handling complex dynamics and represents the problem as a hybrid action space unifying discrete resource usage and continuous maneuvers. However, prior works on hybrid action space have not sufficiently captured the causal dependencies between resource usage and maneuvers. They have also overlooked the multi-modal nature of tactical decisions, both of which are critical in fast-evolving scenarios. In this paper, we propose TART, a Temporal Action Representation learning framework for Tactical resource control and subsequent maneuver generation. TART leverages contrastive learning based on a mutual information objective, designed to capture inherent temporal dependencies in resource-maneuver interactions. These learned representations are quantized into discrete codebook entries that condition the policy, capturing recurring tactical patterns and enabling multi-modal and temporally coherent behaviors. We evaluate TART in two domains where resource deployment is critical: (i) a maze navigation task where a limited budget of discrete actions provides enhanced mobility, and (ii) a high-fidelity air combat simulator in which an F-16 agent operates weapons and defensive systems in coordination with flight maneuvers. Across both domains, TART consistently outperforms hybrid-action baselines, demonstrating its effectiveness in leveraging limited resources and producing context-aware subsequent maneuvers.

3.6CVJun 14, 2025

Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models

Changhyun Choi, Sungha Kim, H. Jin Kim

Recently, it has been shown that investing computing resources in searching for good initial noise for a text-to-image diffusion model helps improve performance. However, previous studies required external models to evaluate the resulting images, which is impossible on GPUs with small VRAM. For these reasons, we apply Best-of-N inference-time scaling to algorithms that optimize the initial noise of a diffusion model without external models across multiple datasets and backbones. We demonstrate that inference-time scaling for text-to-image diffusion models in this setting quickly reaches a performance plateau, and a relatively small number of optimization steps suffices to achieve the maximum achievable performance with each algorithm.