Method Drift›Retrieval-augmented generation
Superseded baseline#338 of 1,179 most-superseded
Video-LLaMA 2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsRetrieval-augmented generation · first seen Jun 11, 2024
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating Video-LLaMA 2. Values are copied from the source paper's tables — verify against the cited paper.
- AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
AffectAgent beats Video-LLaMA 2 · Mean [all MLLM backbones - Video-LLaMA 2]
42.74 vs 35.99
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.