CVMar 26

TopoMesh: High-Fidelity Mesh Autoencoding via Topological Unification

arXiv:2603.2427889.81 citationsh-index: 20
Predicted impact top 16% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a fundamental bottleneck in 3D generation pipelines for computer graphics and vision applications, though it is an incremental improvement over existing VAE approaches.

The paper tackles the problem of high-fidelity 3D mesh reconstruction in VAEs, where representation mismatch between ground-truth meshes and network predictions limits quality. The result is TopoMesh, a sparse voxel-based VAE that unifies meshes under a shared topological framework, significantly outperforming existing VAEs in reconstruction fidelity with superior preservation of sharp features.

The dominant paradigm for high-fidelity 3D generation relies on a VAE-Diffusion pipeline, where the VAE's reconstruction capability sets a firm upper bound on generation quality. A fundamental challenge limiting existing VAEs is the representation mismatch between ground-truth meshes and network predictions: GT meshes have arbitrary, variable topology, while VAEs typically predict fixed-structure implicit fields (\eg, SDF on regular grids). This inherent misalignment prevents establishing explicit mesh-level correspondences, forcing prior work to rely on indirect supervision signals such as SDF or rendering losses. Consequently, fine geometric details, particularly sharp features, are poorly preserved during reconstruction. To address this, we introduce TopoMesh, a sparse voxel-based VAE that unifies both GT and predicted meshes under a shared Dual Marching Cubes (DMC) topological framework. Specifically, we convert arbitrary input meshes into DMC-compliant representations via a remeshing algorithm that preserves sharp edges using an L$\infty$ distance metric. Our decoder outputs meshes in the same DMC format, ensuring that both predicted and target meshes share identical topological structures. This establishes explicit correspondences at the vertex and face level, allowing us to derive explicit mesh-level supervision signals for topology, vertex positions, and face orientations with clear gradients. Our sparse VAE architecture employs this unified framework and is trained with Teacher Forcing and progressive resolution training for stable and efficient convergence. Extensive experiments demonstrate that TopoMesh significantly outperforms existing VAEs in reconstruction fidelity, achieving superior preservation of sharp features and geometric details.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes