CVMar 20, 2025

iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation

arXiv:2503.16653v211 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses scalability issues in mesh generation for 3D modeling, offering a practical solution for applications requiring efficient high-resolution output, though it is incremental in optimizing existing attention mechanisms.

The paper tackles the trade-off between computational efficiency and performance in transformer-based mesh generation by proposing iFlame, an interleaving framework that combines full and linear attention, reducing training time to 2 days on 4 GPUs for 39k data and improving inference speed by almost doubling it while maintaining comparable quality.

This paper propose iFlame, a novel transformer-based network architecture for mesh generation. While attention-based models have demonstrated remarkable performance in mesh generation, their quadratic computational complexity limits scalability, particularly for high-resolution 3D data. Conversely, linear attention mechanisms offer lower computational costs but often struggle to capture long-range dependencies, resulting in suboptimal outcomes. To address this trade-off, we propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms. To further enhance efficiency and leverage the inherent structure of mesh representations, we integrate this interleaving approach into an hourglass architecture, which significantly boosts efficiency. Our approach reduces training time while achieving performance comparable to pure attention-based models. To improve inference efficiency, we implemented a caching algorithm that almost doubles the speed and reduces the KV cache size by seven-eighths compared to the original Transformer. We evaluate our framework on ShapeNet and Objaverse, demonstrating its ability to generate high-quality 3D meshes efficiently. Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance, making it a practical solution for mesh generation. The training takes only 2 days with 4 GPUs on 39k data with a maximum of 4k faces on Objaverse.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes