Method Drift›Retrieval-augmented generation
Superseded baseline#337 of 1,179 most-superseded
Video-LLaMA
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingRetrieval-augmented generation · first seen Jun 5, 2023
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating Video-LLaMA. Values are copied from the source paper's tables — verify against the cited paper.
- AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
AffectAgent beats Video-LLaMA · Mean [all MLLM backbones - Video-LLaMA]
38.75 vs 32.37
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.