CVJul 19, 2023
ClickSeg: 3D Instance Segmentation with Click-Level Weak AnnotationsLeyao Liu, Tao Kong, Minzhao Zhu et al. · bytedance
3D instance segmentation methods often require fully-annotated dense labels for training, which are costly to obtain. In this paper, we present ClickSeg, a novel click-level weakly supervised 3D instance segmentation method that requires one point per instance annotation merely. Such a problem is very challenging due to the extremely limited labels, which has rarely been solved before. We first develop a baseline weakly-supervised training method, which generates pseudo labels for unlabeled data by the model itself. To utilize the property of click-level annotation setting, we further propose a new training framework. Instead of directly using the model inference way, i.e., mean-shift clustering, to generate the pseudo labels, we propose to use k-means with fixed initial seeds: the annotated points. New similarity metrics are further designed for clustering. Experiments on ScanNetV2 and S3DIS datasets show that the proposed ClickSeg surpasses the previous best weakly supervised instance segmentation result by a large margin (e.g., +9.4% mAP on ScanNetV2). Using 0.02% supervision signals merely, ClickSeg achieves $\sim$90% of the accuracy of the fully-supervised counterpart. Meanwhile, it also achieves state-of-the-art semantic segmentation results among weakly supervised methods that use the same annotation settings.
ROApr 3
ARM: Advantage Reward Modeling for Long-Horizon ManipulationYiming Mao, Zixi Yu, Weixin Mao et al.
Long-horizon robotic manipulation remains challenging for reinforcement learning (RL) because sparse rewards provide limited guidance for credit assignment. Practical policy improvement thus relies on richer intermediate supervision, such as dense progress rewards, which are costly to obtain and ill-suited to non-monotonic behaviors such as backtracking and recovery. To address this, we propose Advantage Reward Modeling (ARM), a framework that shifts from hard-to-quantify absolute progress to estimating relative advantage. We introduce a cost-effective tri-state labeling strategy -- Progressive, Regressive, and Stagnant -- that reduces human cognitive overhead while ensuring high cross-annotator consistency. By training on these intuitive signals, ARM enables automated progress annotation for both complete demonstrations and fragmented DAgger-style data. Integrating ARM into an offline RL pipeline allows for adaptive action-reward reweighting, effectively filtering suboptimal samples. Our approach achieves a 99.4% success rate on a challenging long-horizon towel-folding task, demonstrating improved stability and data efficiency over current VLA baselines with near-zero human intervention during policy training.
ROFeb 8, 2022
Navigating to Objects in Unseen Environments by Distance PredictionMinzhao Zhu, Binglei Zhao, Tao Kong
Object Goal Navigation (ObjectNav) task is to navigate an agent to an object category in unseen environments without a pre-built map. In this paper, we solve this task by predicting the distance to the target using semantically-related objects as cues. Based on the estimated distance to the target object, our method directly choose optimal mid-term goals that are more likely to have a shorter path to the target. Specifically, based on the learned knowledge, our model takes a bird's-eye view semantic map as input, and estimates the path length from the frontier map cells to the target object. With the estimated distance map, the agent could simultaneously explore the environment and navigate to the target objects based on a simple human-designed strategy. Empirical results in visually realistic simulation environments show that the proposed method outperforms a wide range of baselines on success rate and efficiency. Real-robot experiment also demonstrates that our method generalizes well to the real world. Video at https://www.youtube.com/watch?v=R79pWVGFKS4