99.4ROMar 15Code
World In Your Hands: A Large-Scale and Open-Source Ecosystem for Learning Human-Centric Manipulation in the WildYupeng Zheng, Jichao Peng, Weize Li et al. · cmu, tsinghua
We introduce World In Your Hands (WIYH), a large-scale open-source ecosystem comprising over 1,000 hours of human manipulation data collected in-the-wild with millimeter-scale motion accuracy. Specifically, WIYH includes (1) the Oracle Suite, a wearable data collection kit with an auto-labeling pipeline for accurate motion capture; (2) the WIYH Dataset, featuring over 1,000 hours of multimodal manipulation data across hundreds of skills in diverse real-world scenarios; and (3) extensive annotations and benchmarks supporting tasks from perception to action. Furthermore, experiments based on the WIYH ecosystem show that integrating WIYH's human-centric data improves robotic manipulation success rates from 8% to 60% in cluttered scenes. World In Your Hands provides a foundation for advancing human-centric data collection and cross-embodiment policy learning. All data and hardware design will be open-source.
ROOct 29, 2019
Autonomous UAV Landing System Based on Visual NavigationZhixin Wu, Peng Han, Ruiwen Yao et al.
In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal plane is calculated according to effective image information, and the relative height is obtained using a linear interpolation method. The landing experiments are performed on a static and a moving platform, respectively. The experimental results illustrate that our proposed landing system performs robustly and accurately.
CVOct 29, 2019
PT-ResNet: Perspective Transformation-Based Residual Network for Semantic Road Image SegmentationRui Fan, Yuan Wang, Lei Qiao et al.
Semantic road region segmentation is a high-level task, which paves the way towards road scene understanding. This paper presents a residual network trained for semantic road segmentation. Firstly, we represent the projections of road disparities in the v-disparity map as a linear model, which can be estimated by optimizing the v-disparity map using dynamic programming. This linear model is then utilized to reduce the redundant information in the left and right road images. The right image is also transformed into the left perspective view, which greatly enhances the road surface similarity between the two images. Finally, the processed stereo images and their disparity maps are concatenated to create a set of 3D images, which are then utilized to train our neural network. The experimental results illustrate that our network achieves a maximum F1-measure of approximately 91.19% when analyzing the images from the KITTI road dataset.