CVApr 2

Semantic Segmentation of Textured Non-manifold 3D Meshes using Transformers

arXiv:2604.018363.5h-index: 6
AI Analysis

This addresses the problem of segmenting complex 3D meshes for applications like urban modeling and cultural heritage, with strong performance gains but incremental method improvements.

The paper tackles semantic segmentation of textured 3D meshes by introducing a texture-aware transformer that incorporates both texture and geometry, achieving 81.9% mF1 and 94.3% OA on the SUM benchmark and 49.7% mF1 and 72.8% OA on a new cultural-heritage dataset.

Textured 3D meshes jointly represent geometry, topology, and appearance, yet their irregular structure poses significant challenges for deep-learning-based semantic segmentation. While a few recent methods operate directly on meshes without imposing geometric constraints, they typically overlook the rich textural information also provided by such meshes. We introduce a texture-aware transformer that learns directly from raw pixels associated with each mesh face, coupled with a new hierarchical learning scheme for multi-scale feature aggregation. A texture branch summarizes all face-level pixels into a learnable token, which is fused with geometrical descriptors and processed by a stack of Two-Stage Transformer Blocks (TSTB), which allow for both a local and a global information flow. We evaluate our model on the Semantic Urban Meshes (SUM) benchmark and a newly curated cultural-heritage dataset comprising textured roof tiles with triangle-level annotations for damage types. Our method achieves 81.9\% mF1 and 94.3\% OA on SUM and 49.7\% mF1 and 72.8\% OA on the new dataset, substantially outperforming existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes