LGNov 24, 2025

TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception

arXiv:2511.19509v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust material perception for robots in safety-critical applications like emergency response and industrial automation, representing an incremental advance over prior multimodal fusion methods.

The paper tackles the problem of multimodal material perception under visually impaired conditions by proposing TouchFormer, a robust transformer-based framework that adaptively integrates cross-modal features, achieving classification accuracy improvements of 2.48% and 6.83% on SSMC and USMC tasks compared to existing non-visual methods.

Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance Embedding Regularization(CER) strategy, which significantly improves classification accuracy in fine-grained subcategory material recognition tasks. Experimental results demonstrate that, compared to existing non-visual methods, the proposed TouchFormer framework achieves classification accuracy improvements of 2.48% and 6.83% on SSMC and USMC tasks, respectively. Furthermore, real-world robotic experiments validate TouchFormer's effectiveness in enabling robots to better perceive and interpret their environment, paving the way for its deployment in safety-critical applications such as emergency response and industrial automation. The code and datasets will be open-source, and the videos are available in the supplementary materials.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes