Garbage in, garbage out: Zero-shot detection of crime using Large Language Models
This addresses crime detection for security applications, but is incremental as it highlights limitations in automated video description systems.
The paper tackled the problem of zero-shot crime detection from surveillance videos by using large language models (LLMs) with textual descriptions, achieving state-of-the-art performance when high-quality descriptions are provided, but found that automated video-to-text methods fail to produce descriptions adequate for this task.
This paper proposes exploiting the common sense knowledge learned by large language models to perform zero-shot reasoning about crimes given textual descriptions of surveillance videos. We show that when video is (manually) converted to high quality textual descriptions, large language models are capable of detecting and classifying crimes with state-of-the-art performance using only zero-shot reasoning. However, existing automated video-to-text approaches are unable to generate video descriptions of sufficient quality to support reasoning (garbage video descriptions into the large language model, garbage out).