CVDec 7, 2025
Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian FusionYu Zhu, Naoya Chiba, Koichi Hashimoto
Reliable 3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects, as commonly seen in industrial environments. In such scenarios, heavy occlusion weakens geometric boundaries between objects, and large differences in object scale will cause end-to-end models fail to capture both coarse and fine details accurately. Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views. To address these challenges, we propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level. Instance segmentation involves rendering a top-view image and projecting SAM-generated masks prompted by YOLO-World back onto the 3D point cloud. Part-level segmentation is subsequently performed by rendering multi-view images of each instance obtained from the previous stage and applying the same 2D segmentation and back-projection process at each view, followed by Bayesian updating fusion to ensure semantic consistency across views. Experiments on real-world factory data demonstrate that our method effectively handles occlusion and structural complexity, achieving consistently high per-class mIoU scores. Additional evaluations on public dataset confirm the generalization ability of our framework, highlighting its robustness, annotation efficiency, and adaptability to diverse 3D environments.
GRAug 30, 2019
Animated Stickies: Fast Video Projection Mapping onto a Markerless Plane through a Direct Closed-Loop AlignmentShingo Kagami, Koichi Hashimoto
This paper presents a fast projection mapping method for moving image content projected onto a markerless planar surface using a low-latency Digital Micromirror Device (DMD) projector. By adopting a closed-loop alignment approach, in which not only the surface texture but also the projected image is tracked by a camera, the proposed method is free from a calibration or position adjustment between the camera and projector. We designed fiducial patterns to be inserted into a fast flapping sequence of binary frames of the DMD projector, which allows the simultaneous tracking of the surface texture and a fiducial geometry separate from a single image captured by the camera. The proposed method implemented on a CPU runs at 400 fps and enables arbitrary video contents to be "stuck" onto a variety of textured surfaces.
CVApr 24, 2018
Spatiotemporal Learning of Dynamic Gestures from 3D Point Cloud DataJoshua Owoyemi, Koichi Hashimoto
In this paper, we demonstrate an end-to-end spatiotemporal gesture learning approach for 3D point cloud data using a new gestures dataset of point clouds acquired from a 3D sensor. Nine classes of gestures were learned from gestures sample data. We mapped point cloud data into dense occupancy grids, then time steps of the occupancy grids are used as inputs into a 3D convolutional neural network which learns the spatiotemporal features in the data without explicit modeling of gesture dynamics. We also introduced a 3D region of interest jittering approach for point cloud data augmentation. This resulted in an increased classification accuracy of up to 10% when the augmented data is added to the original training data. The developed model is able to classify gestures from the dataset with 84.44% accuracy. We propose that point cloud data will be a more viable data type for scene understanding and motion recognition, as 3D sensors become ubiquitous in years to come.