PIT-QMM: A Large Multimodal Model For No-Reference Point Cloud Quality Assessment
This work addresses the need for efficient and accurate quality assessment in 3D graphics and virtual reality applications, representing a novel application of multimodal models to a domain-specific challenge.
The paper tackles the problem of automatically evaluating the perceptual quality of 3D point clouds without a reference, by proposing PIT-QMM, a large multimodal model that uses text, images, and point clouds to predict quality scores, achieving state-of-the-art performance on benchmarks with fewer training iterations.
Large Multimodal Models (LMMs) have recently enabled considerable advances in the realm of image and video quality assessment, but this progress has yet to be fully explored in the domain of 3D assets. We are interested in using these models to conduct No-Reference Point Cloud Quality Assessment (NR-PCQA), where the aim is to automatically evaluate the perceptual quality of a point cloud in absence of a reference. We begin with the observation that different modalities of data - text descriptions, 2D projections, and 3D point cloud views - provide complementary information about point cloud quality. We then construct PIT-QMM, a novel LMM for NR-PCQA that is capable of consuming text, images and point clouds end-to-end to predict quality scores. Extensive experimentation shows that our proposed method outperforms the state-of-the-art by significant margins on popular benchmarks with fewer training iterations. We also demonstrate that our framework enables distortion localization and identification, which paves a new way forward for model explainability and interactivity. Code and datasets are available at https://www.github.com/shngt/pit-qmm.