Hendrik Chiche

4.4CVMar 15

Comparative Analysis of 3D Convolutional and 2.5D Slice-Conditioned U-Net Architectures for MRI Super-Resolution via Elucidated Diffusion Models

Hendrik Chiche, Ludovic Corcos, Logan Rouge

Magnetic resonance imaging (MRI) super-resolution (SR) methods that computationally enhance low-resolution acquisitions to approximate high-resolution quality offer a compelling alternative to expensive high-field scanners. In this work we investigate an elucidated diffusion model (EDM) framework for brain MRI SR and compare two U-Net backbone architectures: (i) a full 3D convolutional U-Net that processes volumetric patches with 3D convolutions and multi-head self-attention, and (ii) a 2.5D slice-conditioned U-Net that super-resolves each slice independently while conditioning on an adjacent slice for inter-slice context. Both models employ continuous-sigma noise conditioning following Karras et al. and are trained on the NKI cohort of the FOMO60K dataset. On a held-out test set of 5 subjects (6 volumes, 993 slices), the 3D model achieves 37.75 dB PSNR, 0.997 SSIM, and 0.020 LPIPS, improving on the off-the-shelf pretrained EDSR baseline (35.57 dB / 0.024 LPIPS) and the 2.5D variant (35.82 dB) across all three metrics under the same test data and degradation pipeline.

5.4ROMar 11

Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics

Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez

Teleoperation of low-cost robotic manipulators remains challenging due to the complexity of mapping human hand articulations to robot joint commands. We present an offline hand-shadowing and retargeting pipeline from a single egocentric RGB-D camera mounted on 3D-printed glasses. The pipeline detects 21 hand landmarks per hand using MediaPipe Hands, deprojects them into 3D via depth sensing, transforms them into the robot coordinate frame, and solves a damped-least-squares inverse kinematics problem in PyBullet to produce joint commands for the 6-DOF SO-ARM101 robot. A gripper controller maps thumb-index finger geometry to grasp aperture with a four-level fallback hierarchy. Actions are first previewed in a physics simulation before replay on the physical robot through the LeRobot framework. We evaluate the IK retargeting pipeline on a structured pick-and-place benchmark (5-tile grid, 10 grasps per tile) achieving a 90% success rate, and compare it against four vision-language-action policies (ACT, SmolVLA, pi0.5, GR00T N1.5) trained on leader-follower teleoperation data. We also test the IK pipeline in unstructured real-world environments (grocery store, pharmacy), where hand occlusion by surrounding objects reduces success to 9.3% (N=75), highlighting both the promise and current limitations of marker-free analytical retargeting.

Hendrik Chiche

2 Papers