ROSep 25, 2024Code
WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial NetworksAlberto Bacchin, Leonardo Barcellona, Matteo Terreran et al.
Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git
CVJul 19, 2024
Component Selection for Craft Assembly TasksVitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe et al.
Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
ROMar 27
Generalizable task-oriented object grasping through LLM-guided ontology and similarity-based planningHao Chen, Takuya Kiyokawa, Weiwei Wan et al.
Task-oriented grasping (TOG) is more challenging than simple object grasping because it requires precise identification of object parts and careful selection of grasping areas to ensure effective and robust manipulation. While recent approaches have trained large-scale vision-language models to integrate part-level object segmentation with task-aware grasp planning, their instability in part recognition and grasp inference limits their ability to generalize across diverse objects and tasks. To address this issue, we introduce a novel, geometry-centric strategy for more generalizable TOG that does not rely on semantic features from visual recognition, effectively overcoming the viewpoint sensitivity of model-based approaches. Our main proposals include: 1) an object-part-task ontology for functional part selection based on intuitive human commands, constructed using a Large Language Model (LLM); 2) a sampling-based geometric analysis method for identifying the selected object part from observed point clouds, incorporating multiple point distribution and distance metrics; and 3) a similarity matching framework for imitative grasp planning, utilizing similar known objects with pre-existing segmentation and grasping knowledge as references to guide the planning for unknown targets. We validate the high accuracy of our approach in functional part selection, identification, and grasp generation through real-world experiments. Additionally, we demonstrate the method's generalization capabilities to novel-category objects by extending existing ontological knowledge, showcasing its adaptability to a broad range of objects and tasks.
CVDec 4, 2025
Prompt2Craft: Generating Functional Craft Assemblies with LLMsVitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe et al.
Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labeled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
CVApr 11, 2023
Efficiently Collecting Training Dataset for 2D Object Detection by Online Visual FeedbackTakuya Kiyokawa, Naoki Shirakura, Hiroki Katayama et al.
Training deep-learning-based vision systems require the manual annotation of a significant number of images. Such manual annotation is highly time-consuming and labor-intensive. Although previous studies have attempted to eliminate the effort required for annotation, the effort required for image collection was retained. To address this, we propose a human-in-the-loop dataset collection method that uses a web application. To counterbalance the workload and performance by encouraging the collection of multi-view object image datasets in an enjoyable manner, thereby amplifying motivation, we propose three types of online visual feedback features to track the progress of the collection status. Our experiments thoroughly investigated the impact of each feature on collection performance and quality of operation. The results suggested the feasibility of annotation and object detection.
ROMay 20
Component Influence-Driven Fastener Reduction for Robotic Disassemblability-Aware Design SimplificationTakuya Kiyokawa, Tomoki Ishikura, Shingo Hamada et al.
To accelerate automated remanufacturing, robotic disassembly must be considered during the product design phase. However, designers currently lack quantitative feedback to identify which structural elements hinder robotic operations. To address this, this study proposes an analytical framework that provides actionable redesign guidance focused on fastener reduction, as fasteners are numerous and ubiquitous components found in almost all manufactured products. Using a Computer-Aided Design (CAD) model and its automatically generated Contact-Connection-Constraint (CCC) graph, the framework translates robotic disassembly sequence planning outcomes into component influence scores. These scores reflect how often a component causes structural constraint violations or evaluation objective deteriorations in the robotic disassembly sequence. To visually highlight structural hindrances, the framework projects these scores onto the CAD geometry as 3D heatmaps. The system then analytically simulates the removal of highly influential fasteners. It reports the expected reductions in structural constraints, tool changes, and robot travel distances, while preventing structurally unsafe modifications by evaluating geometric stability metrics. Experiments on seven household appliances demonstrate that the framework successfully targets redundant fasteners. Removing the recommended fasteners simplified the structural dependencies by eliminating between 8 and 132 structural constraints on the graph depending on each product's structural configuration. Furthermore, it improved robotic operational efficiency by eliminating unnecessary tool change operations and shortening travel distances by 165 to 1675 millimeters wherever structurally permissible.
ROMar 31
Industrial-Grade Robust Robot Vision for Screw Detection and Removal under Uneven ConditionsTomoki Ishikura, Genichiro Matsuda, Takuya Kiyokawa et al.
As the amount of used home appliances is expected to increase despite the decreasing labor force in Japan, there is a need to automate disassembling processes at recycling plants. The automation of disassembling air conditioner outdoor units, however, remains a challenge due to unit size variations and exposure to dirt and rust. To address these challenges, this study proposes an automated system that integrates a task-specific two-stage detection method and a lattice-based local calibration strategy. This approach achieved a screw detection recall of 99.8% despite severe degradation and ensured a manipulation accuracy of +/-0.75 mm without pre-programmed coordinates. In real-world validation with 120 units, the system attained a disassembly success rate of 78.3% and an average cycle time of 193 seconds, confirming its feasibility for industrial application.
ROMar 22
Affordance-Guided Enveloping Grasp Demonstration Toward Non-destructive Disassembly of Pinch-Infeasible Mating PartsMasaki Tsutsumi, Takuya Kiyokawa, Gen Sako et al.
Robotic disassembly of complex mating components often renders pinch grasping infeasible, necessitating multi-fingered enveloping grasps. However, visual occlusions and geometric constraints complicate teaching appropriate grasp motions when relying solely on 2D camera feeds. To address this, we propose an affordance-guided teleoperation method that pre-generates enveloping grasp candidates via physics simulation. These Affordance Templates (ATs) are visualized with a color gradient reflecting grasp quality to augment operator perception. Simulations demonstrate the method's generality across various components. Real-robot experiments validate that AT-based visual augmentation enables operators to effectively select and teach enveloping grasp strategies for real-world disassembly, even under severe visual and geometric constraints.
ROMar 19, 2025
Robotic Paper Wrapping by Learning Force ControlHiroki Hanai, Takuya Kiyokawa, Weiwei Wan et al.
Robotic packaging using wrapping paper poses significant challenges due to the material's complex deformation properties. The packaging process itself involves multiple steps, primarily categorized as folding the paper or creating creases. Small deviations in the robot's arm trajectory or force vector can lead to tearing or wrinkling of the paper, exacerbated by the variability in material properties. This study introduces a novel framework that combines imitation learning and reinforcement learning to enable a robot to perform each step of the packaging process efficiently. The framework allows the robot to follow approximate trajectories of the tool-center point (TCP) based on human demonstrations while optimizing force control parameters to prevent tearing or wrinkling, even with variable wrapping paper materials. The proposed method was validated through ablation studies, which demonstrated successful task completion with a significant reduction in tear and wrinkle rates. Furthermore, the force control strategy proved to be adaptable across different wrapping paper materials and robust against variations in the size of the target object.
ROJul 16, 2025
A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-TuningHao Chen, Takuya Kiyokawa, Zhengtao Hu et al.
Grasping unknown objects from a single view has remained a challenging topic in robotics due to the uncertainty of partial observation. Recent advances in large-scale models have led to benchmark solutions such as GraspNet-1Billion. However, such learning-based approaches still face a critical limitation in performance robustness for their sensitivity to sensing noise and environmental changes. To address this bottleneck in achieving highly generalized grasping, we abandon the traditional learning framework and introduce a new perspective: similarity matching, where similar known objects are utilized to guide the grasping of unknown target objects. We newly propose a method that robustly achieves unknown-object grasping from a single viewpoint through three key steps: 1) Leverage the visual features of the observed object to perform similarity matching with an existing database containing various object models, identifying potential candidates with high similarity; 2) Use the candidate models with pre-existing grasping knowledge to plan imitative grasps for the unknown target object; 3) Optimize the grasp quality through a local fine-tuning process. To address the uncertainty caused by partial and noisy observation, we propose a multi-level similarity matching framework that integrates semantic, geometric, and dimensional features for comprehensive evaluation. Especially, we introduce a novel point cloud geometric descriptor, the C-FPFH descriptor, which facilitates accurate similarity assessment between partial point clouds of observed objects and complete point clouds of database models. In addition, we incorporate the use of large language models, introduce the semi-oriented bounding box, and develop a novel point cloud registration approach based on plane detection to enhance matching accuracy under single-view conditions. Videos are available at https://youtu.be/qQDIELMhQmk.
RONov 15, 2024
Self-Supervised Learning of Grasping Arbitrary Objects On-the-MoveTakuya Kiyokawa, Eiki Nagata, Yoshihisa Tsurumine et al.
Mobile grasping enhances manipulation efficiency by utilizing robots' mobility. This study aims to enable a commercial off-the-shelf robot for mobile grasping, requiring precise timing and pose adjustments. Self-supervised learning can develop a generalizable policy to adjust the robot's velocity and determine grasp position and orientation based on the target object's shape and pose. Due to mobile grasping's complexity, action primitivization and step-by-step learning are crucial to avoid data sparsity in learning from trial and error. This study simplifies mobile grasping into two grasp action primitives and a moving action primitive, which can be operated with limited degrees of freedom for the manipulator. This study introduces three fully convolutional neural network (FCN) models to predict static grasp primitive, dynamic grasp primitive, and residual moving velocity error from visual inputs. A two-stage grasp learning approach facilitates seamless FCN model learning. The ablation study demonstrated that the proposed method achieved the highest grasping accuracy and pick-and-place efficiency. Furthermore, randomizing object shapes and environments in the simulation effectively achieved generalizable mobile grasping.
ROJan 20, 2022
Category-Association Based Similarity Matching for Novel Object Pick-and-Place TaskHao Chen, Takuya Kiyokawa, Weiwei Wan et al.
Robotic pick-and-place has been researched for a long time to cope with uncertainty of novel objects and changeable environments. Past works mainly focus on learning-based methods to achieve high precision. However, they have difficulty being generalized for the limitation of specified training models. To break through this drawback of learning-based approaches, we introduce a new perspective of similarity matching between novel objects and a known database based on category-association to achieve pick-and-place tasks with high accuracy and stabilization. We calculate the category name similarity using word embedding to quantify the semantic similarity between the categories of known models and the target real-world objects. With a similar model identified by a similarity prediction function, we preplan a series of robust grasps and imitate them to plan new grasps on the real-world target object. We also propose a distance-based method to infer the in-hand posture of objects and adjust small rotations to achieve stable placements under uncertainty. Through a real-world robotic pick-and-place experiment with a dozen of in-category and out-of-category novel objects, our method achieved an average success rate of 90.6% and 75.9% respectively, validating the capacity of generalization to diverse objects.
RONov 16, 2021
Active Vapor-Based Robotic WiperTakuya Kiyokawa, Hiroki Katayama, Jun Takamatsu et al.
This paper presents a method for estimating normals of mirrors and transparent objects challenging for cameras to recognize. We propose spraying water vapor onto mirror or transparent surfaces to create a diffuse reflective surface. Using an ultrasonic humidifier on a robotic arm, we apply water vapor to the target object's surface, forming a cross-shaped misted area. This creates partially diffuse reflective surfaces, enabling the camera to detect the target object's surface. Adjusting the gripper-mounted camera viewpoint maximizes the extracted misted area's appearance in the image, allowing normal estimation of the target surface. Experiments show the method's effectiveness, with RMSEs of azimuth estimation for mirrors and transparent glass at approximately 4.2 and 5.8 degrees, respectively. Our robot experiments demonstrated that our robotic wiper can perform contact-force-regulated wiping motions to clean a transparent window, akin to human performance.
ROSep 15, 2021
Soft-Jig: A Flexible Sensing Jig for Simultaneously Fixing and Estimating Orientation of Assembly PartsTatsuya Sakuma, Takuya Kiyokawa, Jun Takamatsu et al.
For assembly tasks, it is essential to firmly fix target parts and to accurately estimate their poses. Several rigid jigs for individual parts are frequently used in assembly factories to achieve precise and time-efficient product assembly. However, providing customized jigs is time-consuming. In this study, to address the lack of versatility in the shapes the jigs can be used for, we developed a flexible jig with a soft membrane including transparent beads and oil with a tuned refractive index. The bead-based jamming transition was accomplished by discharging only oil enabling a part to be firmly fixed. Because the two cameras under the jig are able to capture membrane shape changes, we proposed a sensing method to estimate the orientation of the part based on the behaviors of markers created on the jig's inner surface. Through estimation experiments, the proposed system could estimate the orientation of a cylindrical object with a diameter larger than 50 mm and an RMSE of less than 3 degrees.
ROApr 2, 2021
Robotic Waste Sorter with Agile Manipulation and Quickly Trainable DetectorTakuya Kiyokawa, Hiroki Katayama, Yuya Tatsuta et al.
Owing to human labor shortages, the automation of labor-intensive manual waste-sorting is needed. The goal of automating waste-sorting is to replace the human role of robust detection and agile manipulation of waste items with robots. To achieve this, we propose three methods. First, we provide a combined manipulation method using graspless push-and-drop and pick-and-release manipulation. Second, we provide a robotic system that can automatically collect object images to quickly train a deep neural-network model. Third, we provide a method to mitigate the differences in the appearance of target objects from two scenes: one for dataset collection and the other for waste sorting in a recycling factory. If differences exist, the performance of a trained waste detector may decrease. We address differences in illumination and background by applying object scaling, histogram matching with histogram equalization, and background synthesis to the source target-object images. Via experiments in an indoor experimental workplace for waste-sorting, we confirm that the proposed methods enable quick collection of the training image sets for three classes of waste items (i.e., aluminum can, glass bottle, and plastic bottle) and detection with higher performance than the methods that do not consider the differences. We also confirm that the proposed method enables the robot quickly manipulate the objects.
ROOct 21, 2020
Assembly Sequences Based on Multiple Criteria Against Products with Deformable PartsTakuya Kiyokawa, Jun Takamatsu, Tsukasa Ogasawara
Aiming to generate easy-to-handle assembly sequences for robotic assembly, this study tackles assembly sequence generation by considering two tradeoff objectives: (1) insertion conditions and (2) degrees of constraints among assembled parts. We propose a multiobjective genetic algorithm to balance these two objectives for generating assembly sequences. Furthermore, the method of extracting part relation matrices including interference-free, insertion, and degree of constraint matrices is extended for application to 3D computer-aided design (CAD) models, including deformable parts. The interference of deformable parts with other parts can be easily investigated by scaling parts. A simulation experiment was conducted using the proposed method, and the results show the possibility of obtaining Pareto-optimal solutions of assembly sequences for a 3D CAD model with 33 parts including a deformable part. This approach can potentially be extended to handle various types of deformable parts and to explore graspable sequences during assembly operations.
ROOct 21, 2020
Soft-Jig-Driven Assembly OperationsTakuya Kiyokawa, Tatsuya Sakuma, Jun Takamatsu et al.
To design a general-purpose assembly robot system that can handle objects of various shapes, we propose a soft jig that fits to the shapes of assembly parts. The functionality of the soft jig is based on a jamming gripper developed in the field of soft robotics. The soft jig has a bag covered with a malleable silicone membrane, which has high friction, elongation, and contraction rates for keeping parts fixed. The bag is filled with glass beads to achieve a jamming transition. We propose a method to configure parts-fixing on the soft jig based on contact relations, reachable directions, and the center of gravity of the parts that are fixed on the jig. The usability of the soft jig was evaluated in terms of the fixing performance and versatility for various shapes and postures of parts.
ROAug 4, 2020
A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household OperationsNaoki Wake, Riku Arakawa, Iori Yanokura et al.
A household robot is expected to perform various manipulative operations with an understanding of the purpose of the task. To this end, a desirable robotic application should provide an on-site robot teaching framework for non-experts. Here we propose a Learning-from-Observation (LfO) framework for grasp-manipulation-release class household operations (GMR-operations). The framework maps human demonstrations to predefined task models through one-shot teaching. Each task model contains both high-level knowledge regarding the geometric constraints and low-level knowledge related to human postures. The key idea is to design a task model that 1) covers various GMR-operations and 2) includes human postures to achieve tasks. We verify the applicability of our framework by testing an operational LfO system with a real robot. In addition, we quantify the coverage of the task model by analyzing online videos of household operations. In the context of one-shot robot teaching, the contribution of this study is a framework that 1) covers various GMR-operations and 2) mimics human postures during the operations.