CVApr 10, 2025

Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

arXiv:2504.08125v16 citationsh-index: 41CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and human-aligned evaluation for researchers and developers in text-to-3D generation, though it is incremental as it builds on existing vLLM technology.

The paper tackles the lack of robust evaluation metrics for text-to-3D generation by introducing Gen3DEval, a framework that uses fine-tuned vision large language models to assess text fidelity, appearance, and surface quality without ground-truth data, achieving superior performance in user-aligned evaluations compared to state-of-the-art models.

Rapid advancements in text-to-3D generation require robust and scalable evaluation metrics that align closely with human judgment, a need unmet by current metrics such as PSNR and CLIP, which require ground-truth data or focus only on prompt fidelity. To address this, we introduce Gen3DEval, a novel evaluation framework that leverages vision large language models (vLLMs) specifically fine-tuned for 3D object quality assessment. Gen3DEval evaluates text fidelity, appearance, and surface quality by analyzing 3D surface normals, without requiring ground-truth comparisons, bridging the gap between automated metrics and user preferences. Compared to state-of-the-art task-agnostic models, Gen3DEval demonstrates superior performance in user-aligned evaluations, placing it as a comprehensive and accessible benchmark for future research on text-to-3D generation. The project page can be found here: \href{https://shalini-maiti.github.io/gen3deval.github.io/}{https://shalini-maiti.github.io/gen3deval.github.io/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes