CVJan 6, 2025

Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment

arXiv:2501.02706v12 citationsh-index: 6ICASSP
Originality Incremental advance
AI Analysis

It addresses a specific challenge in AI-generated video quality assessment, which is incremental as it builds on existing methods for a new domain.

The paper tackles the problem of assessing AI-generated video quality by introducing MSA-VQA, a multilevel semantic-aware model that leverages CLIP-based supervision and cross-attention mechanisms, achieving state-of-the-art results.

The rapid development of diffusion models has greatly advanced AI-generated videos in terms of length and consistency recently, yet assessing AI-generated videos still remains challenging. Previous approaches have often focused on User-Generated Content(UGC), but few have targeted AI-Generated Video Quality Assessment methods. In this work, we introduce MSA-VQA, a Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment, which leverages CLIP-based semantic supervision and cross-attention mechanisms. Our hierarchical framework analyzes video content at three levels: frame, segment, and video. We propose a Prompt Semantic Supervision Module using text encoder of CLIP to ensure semantic consistency between videos and conditional prompts. Additionally, we propose the Semantic Mutation-aware Module to capture subtle variations between frames. Extensive experiments demonstrate our method achieves state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes