ROMar 10

ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

Minchi Ruan, LiangQing Zhou, Hongtong Li, Zongtao Wang, ZhaoMing Lu, Jianwei Zhang, Bin Fang

arXiv:2603.09565v18.8h-index: 4

Predicted impact top 31% in RO · last 90 daysOriginality Highly original

AI Analysis

It addresses sub-millimeter corrections in robotic assembly for industrial applications, offering a novel fusion approach with strong performance gains.

The paper tackled the problem of precision assembly in occluded contact-rich regions by proposing ReTac-ACT, a vision-tactile imitation learning policy, achieving 90% success on a benchmark and 80% success at 0.1mm clearance.

Precision assembly requires sub-millimeter corrections in contact-rich "last-millimeter" regions where visual feedback fails due to occlusion from the end-effector and workpiece. We present ReTac-ACT (Reconstruction-enhanced Tactile ACT), a vision-tactile imitation learning policy that addresses this challenge through three synergistic mechanisms: (i) bidirectional cross-attention enabling reciprocal visuo-tactile feature enhancement before fusion, (ii) a proprioception-conditioned gating network that dynamically elevates tactile reliance when visual occlusion occurs, and (iii) a tactile reconstruction objective enforcing learning of manipulation-relevant contact information rather than generic visual textures. Evaluated on the standardized NIST Assembly Task Board M1 benchmark, ReTac-ACT achieves 90% peg-in-hole success, substantially outperforming vision-only and generalist baseline methods, and maintains 80% success at industrial-grade 0.1mm clearance. Ablation studies validate that each architectural component is indispensable. The ReTac-ACT codebase and a vision-tactile demonstration dataset covering various clearance levels with both visual and tactile features will be released to support reproducible research.

View on arXiv PDF

Similar