ROMar 10

ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

arXiv:2603.09565v163.2h-index: 6
Predicted impact top 31% in RO · last 90 daysOriginality Highly original
AI Analysis

It addresses sub-millimeter corrections in robotic assembly for industrial applications, offering a novel fusion approach with strong performance gains.

The paper tackled the problem of precision assembly in occluded contact-rich regions by proposing ReTac-ACT, a vision-tactile imitation learning policy, achieving 90% success on a benchmark and 80% success at 0.1mm clearance.

Precision assembly requires sub-millimeter corrections in contact-rich "last-millimeter" regions where visual feedback fails due to occlusion from the end-effector and workpiece. We present ReTac-ACT (Reconstruction-enhanced Tactile ACT), a vision-tactile imitation learning policy that addresses this challenge through three synergistic mechanisms: (i) bidirectional cross-attention enabling reciprocal visuo-tactile feature enhancement before fusion, (ii) a proprioception-conditioned gating network that dynamically elevates tactile reliance when visual occlusion occurs, and (iii) a tactile reconstruction objective enforcing learning of manipulation-relevant contact information rather than generic visual textures. Evaluated on the standardized NIST Assembly Task Board M1 benchmark, ReTac-ACT achieves 90% peg-in-hole success, substantially outperforming vision-only and generalist baseline methods, and maintains 80% success at industrial-grade 0.1mm clearance. Ablation studies validate that each architectural component is indispensable. The ReTac-ACT codebase and a vision-tactile demonstration dataset covering various clearance levels with both visual and tactile features will be released to support reproducible research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes