Method Drift›Retrieval-augmented generation

Superseded baseline#338 of 1,179 most-superseded

Video-LLaMA 2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Retrieval-augmented generation · first seen Jun 11, 2024

superseded — cited as a baseline and beaten by newer methods

0 papers critique it · 1 beat it on benchmarks

Beaten on benchmarks

Head-to-head results where a newer method reports beating Video-LLaMA 2. Values are copied from the source paper's tables — verify against the cited paper.

AffectAgent beats Video-LLaMA 2 · Mean [all MLLM backbones - Video-LLaMA 2]
42.74 vs 35.99
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

AffectAgent AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
Apr 14, 2026