CVMar 3, 2022
Sim2Real Instance-Level Style Transfer for 6D Pose EstimationTakuya Ikeda, Suomi Tanishige, Ayako Amma et al.
In recent years, synthetic data has been widely used in the training of 6D pose estimation networks, in part because it automatically provides perfect annotation at low cost. However, there are still non-trivial domain gaps, such as differences in textures/materials, between synthetic and real data. These gaps have a measurable impact on performance. To solve this problem, we introduce a simulation to reality (sim2real) instance-level style transfer for 6D pose estimation network training. Our approach transfers the style of target objects individually, from synthetic to real, without human intervention. This improves the quality of synthetic data for training pose estimation networks. We also propose a complete pipeline from data collection to the training of a pose estimation network and conduct extensive evaluation on a real-world robotic platform. Our evaluation shows significant improvement achieved by our method in both pose estimation performance and the realism of images adapted by the style transfer.
CVMar 9, 2022
Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose EstimationHiroya Sato, Takuya Ikeda, Koichi Nishiwaki
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to the ambiguity representation. However, it requires the complex computation of the normalizing constants. This is the bottleneck of loss computation in training neural networks based on Bingham representation. As such, we propose a fast-computable and easy-to-implement loss function for Bingham distribution. We also show not only to examine the parametrization of Bingham distribution but also an application based on our loss function.
CVNov 23, 2023
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic CorrespondencePengyuan Wang, Takuya Ikeda, Robert Lee et al.
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.
CVFeb 20, 2024
DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose EstimationTakuya Ikeda, Sergey Zakharov, Tianyi Ko et al. · gatech
This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes as well as establishing correspondences essential for pose estimation. Furthermore, we introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations. We demonstrate the effectiveness of our method by testing it on a range of real datasets. Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities, outperforming baselines, even those specifically trained on the target domain.
ROApr 15, 2025
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic GraspingShun Iwase, Zubair Irshad, Katherine Liu et al. · gatech
Robotic grasping is a cornerstone capability of embodied systems. Many methods directly output grasps from partial information without modeling the geometry of the scene, leading to suboptimal motion and even collisions. To address these issues, we introduce ZeroGrasp, a novel framework that simultaneously performs 3D reconstruction and grasp pose prediction in near real-time. A key insight of our method is that occlusion reasoning and modeling the spatial relationships between objects is beneficial for both accurate reconstruction and grasping. We couple our method with a novel large-scale synthetic dataset, which comprises 1M photo-realistic images, high-resolution 3D reconstructions and 11.3B physically-valid grasp pose annotations for 12K objects from the Objaverse-LVIS dataset. We evaluate ZeroGrasp on the GraspNet-1B benchmark as well as through real-world robot experiments. ZeroGrasp achieves state-of-the-art performance and generalizes to novel real-world objects by leveraging synthetic data.
CVMay 17, 2025
GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric ComplexityTakuya Ikeda, Sergey Zakharov, Muhammad Zubair Irshad et al. · gatech
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video. Existing methods, while achieving impressive results, often struggle with complex objects, particularly those exhibiting symmetry, intricate geometry or complex appearance. To bridge these gaps, we introduce an adaptive method that combines 3D Gaussian Splatting, hybrid geometry/appearance tracking, and key frame selection to achieve robust tracking and accurate reconstructions across a diverse range of objects. Additionally, we present a benchmark covering these challenging object classes, providing high-quality annotations for evaluating both tracking and reconstruction performance. Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
CVApr 15, 2024
ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part FusionTianhan Xu, Takuya Ikeda, Koichi Nishiwaki
In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps. Recent works have used neural radiance fields to model 3D scenes and improved the quality of novel view synthesis, while few studies have focused on modeling the invisible or occluded parts of the training images. These under-reconstruction parts constrain both scene editing and rendering view selection, thereby limiting their utility for synthetic data generation for downstream tasks. Our basic idea is that, by observing the same set of objects in various arrangement, so that parts that are invisible in one scene may become visible in others. By fusing the visible parts from each scene, occlusion-free rendering of both background and foreground objects can be achieved. We decompose the multi-scene fusion task into two main components: (1) objects/background segmentation and alignment, where we leverage point cloud-based methods tailored to our novel problem formulation; (2) radiance fields fusion, where we introduce visibility field to quantify the visible information of radiance fields, and propose visibility-aware rendering for the fusion of series of scenes, ultimately obtaining clean background and 360$^\circ$ object rendering. Comprehensive experiments were conducted on synthetic and real datasets, and the results demonstrate the effectiveness of our method.
CVMay 30, 2023
A Probabilistic Rotation Representation for Symmetric Shapes With an Efficiently Computable Bingham Loss FunctionHiroya Sato, Takuya Ikeda, Koichi Nishiwaki
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An alternative easy-to-implement loss function has been proposed to avoid complex computations but has difficulty expressing symmetric distribution. In this paper, we introduce a fast-computable and easy-to-implement NLL loss function for Bingham distribution. We also create the inference network and show that our loss function can capture the symmetric property of target objects from their point clouds.