CVFeb 24, 2025

Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model

Kang Fu, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, Xiongkuo Min, Jia Wang, Guangtao Zhai

arXiv:2502.16915v110.25 citationsh-index: 49Has CodeIEEE transactions on multimedia

Originality Synthesis-oriented

AI Analysis

This addresses the problem of quality assessment for text-to-3D assets, which is crucial for researchers and developers in generative AI, though it is incremental as it builds on existing text-to-image evaluation concepts.

The paper tackles the lack of evaluation methods for text-to-3D asset generation by creating the AIGC-T23DAQA database with 969 3D assets from 6 models and 170 prompts, and develops a quality assessment model to evaluate assets based on quality, authenticity, and text-asset correspondence.

Recent advancements in text-to-image (T2I) generation have spurred the development of text-to-3D asset (T23DA) generation, leveraging pretrained 2D text-to-image diffusion models for text-to-3D asset synthesis. Despite the growing popularity of text-to-3D asset generation, its evaluation has not been well considered and studied. However, given the significant quality discrepancies among various text-to-3D assets, there is a pressing need for quality assessment models aligned with human subjective judgments. To tackle this challenge, we conduct a comprehensive study to explore the T23DA quality assessment (T23DAQA) problem in this work from both subjective and objective perspectives. Given the absence of corresponding databases, we first establish the largest text-to-3D asset quality assessment database to date, termed the AIGC-T23DAQA database. This database encompasses 969 validated 3D assets generated from 170 prompts via 6 popular text-to-3D asset generation models, and corresponding subjective quality ratings for these assets from the perspectives of quality, authenticity, and text-asset correspondence, respectively. Subsequently, we establish a comprehensive benchmark based on the AIGC-T23DAQA database, and devise an effective T23DAQA model to evaluate the generated 3D assets from the aforementioned three perspectives, respectively.

View on arXiv PDF Code

Similar