MM AI CL CVApr 23, 2024

Pegasus-v1 Technical Report

Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim

arXiv:2404.14687v15.95 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This addresses video content analysis for AI applications, but appears incremental as it builds on existing multimodal models.

The authors introduced Pegasus-1, a multimodal language model for video understanding, tackling challenges like spatiotemporal interpretation to improve comprehension across video lengths, and reported its performance on benchmarks for video conversation, zero-shot QA, and summarization.

This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.

View on arXiv PDF

Similar