CVApr 2, 2025

Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes

Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai

arXiv:2504.01466v28.43 citationsh-index: 49Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses the need for adaptable 3D vision systems that can handle both geometric structure and texture in mesh saliency prediction, representing a domain-specific advancement.

The authors tackled the problem of predicting visual saliency on 3D meshes by creating the first comprehensive dataset that captures differences between textured and non-textured conditions, and introduced Mesh Mamba, a unified state space model that improves performance across various mesh types through global context modeling.

Mesh saliency enhances the adaptability of 3D vision by identifying and emphasizing regions that naturally attract visual attention. To investigate the interaction between geometric structure and texture in shaping visual attention, we establish a comprehensive mesh saliency dataset, which is the first to systematically capture the differences in saliency distribution under both textured and non-textured visual conditions. Furthermore, we introduce mesh Mamba, a unified saliency prediction model based on a state space model (SSM), designed to adapt across various mesh types. Mesh Mamba effectively analyzes the geometric structure of the mesh while seamlessly incorporating texture features into the topological framework, ensuring coherence throughout appearance-enhanced modeling. More importantly, by subgraph embedding and a bidirectional SSM, the model enables global context modeling for both local geometry and texture, preserving the topological structure and improving the understanding of visual details and structural complexity. Through extensive theoretical and empirical validation, our model not only improves performance across various mesh types but also demonstrates high scalability and versatility, particularly through cross validations of various visual features.

View on arXiv PDF Code

Similar