Skand

10.3ROJun 30

3D Point World Models: Point Completion Enables More Accurate Dynamics Learning

Skand Peri, Hung Nguyen, Chanho Kim et al.

Learning predictive models of the world enables robotic control through planning, potentially allowing robots to improvise solutions on new tasks. However, large video-based dynamics models lack explicit 3D spatial structure and suffer from geometrically inconsistent long-term rollouts with compounding errors. Emerging 3D dynamics models based on partial point clouds improve geometric consistency but remain sensitive to occlusions and accumulated prediction drift. To address these challenges, we present 3D Point World Models (3DPWM) - a task-agnostic world model that operates entirely in 3D space by first completing partial point clouds and then learning action-conditioned dynamics in this completed 3D scene. By operating on completed geometry, 3DPWM enables reliable long-horizon rollouts and more accurate cost evaluation for model-based planning while supporting adaptation to new tasks. Experiments across different robotic embodiments and tabletop manipulation benchmarks demonstrate that 3DPWM achieves significantly more reliable long-horizon rollouts (100-300+ steps), supports both open-loop and closed-loop planning, and enables successful sim-to-real transfer.

16.4IVAug 4, 2021

MRI to PET Cross-Modality Translation using Globally and Locally Aware GAN (GLA-GAN) for Multi-Modal Diagnosis of Alzheimer's Disease

Apoorva Sikka, Skand Peri, Jitender Singh Virk et al.

Medical imaging datasets are inherently high dimensional with large variability and low sample sizes that limit the effectiveness of deep learning algorithms. Recently, generative adversarial networks (GANs) with the ability to synthesize realist images have shown great potential as an alternative to standard data augmentation techniques. Our work focuses on cross-modality synthesis of fluorodeoxyglucose~(FDG) Positron Emission Tomography~(PET) scans from structural Magnetic Resonance~(MR) images using generative models to facilitate multi-modal diagnosis of Alzheimer's disease (AD). Specifically, we propose a novel end-to-end, globally and locally aware image-to-image translation GAN (GLA-GAN) with a multi-path architecture that enforces both global structural integrity and fidelity to local details. We further supplement the standard adversarial loss with voxel-level intensity, multi-scale structural similarity (MS-SSIM) and region-of-interest (ROI) based loss components that reduce reconstruction error, enforce structural consistency at different scales and perceive variation in regional sensitivity to AD respectively. Experimental results demonstrate that our GLA-GAN not only generates synthesized FDG-PET scans with enhanced image quality but also superior clinical utility in improving AD diagnosis compared to state-of-the-art models. Finally, we attempt to interpret some of the internal units of the GAN that are closely related to this specific cross-modality generation task.

Skand

2 Papers