SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation
This addresses the need for realistic 3D shapes in content creation areas like computer games and film, but it is incremental as it builds on existing mesh encoding and LLM techniques.
The paper tackles the problem of evaluating 3D shape realism without ground truth references by proposing SRAM, a metric that uses a large language model to align mesh information with human perception, and introduces a new dataset, RealismGrading, with human-annotated scores. Experimental results show that SRAM correlates well with human perceptions and outperforms existing methods, demonstrating good generalizability.
3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes that look truly realistic. Traditional evaluation methods rely on a ground truth to measure mesh fidelity. However, in many practical cases, a shape's realism does not depend on having a ground truth reference. In this work, we propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation. To achieve this, we adopt a mesh encoding approach that converts 3D shapes into the language token space. A dedicated realism decoder is designed to align the language model's output with human perception of realism. Additionally, we introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes. Our dataset includes shapes generated by 16 different algorithms on over a dozen objects, making it more representative of practical 3D shape distributions. We validate our metric's performance and generalizability through k-fold cross-validation across different objects. Experimental results show that our metric correlates well with human perceptions and outperforms existing methods, and has good generalizability.