CVAIJan 27, 2025

Large Models in Dialogue for Active Perception and Anomaly Detection

arXiv:2501.16300v1h-index: 32
Originality Incremental advance
AI Analysis

This addresses the problem of anomaly detection in inaccessible areas for autonomous systems, but it is incremental as it builds on existing LLM and VQA methods.

The paper tackles autonomous aerial monitoring by proposing a framework where a Large Language Model (LLM) and a Visual Question Answering (VQA) model engage in dialogue to actively control a drone for improved perception and anomaly detection in novel scenes, demonstrating effectiveness in alerting about potential hazards.

Autonomous aerial monitoring is an important task aimed at gathering information from areas that may not be easily accessible by humans. At the same time, this task often requires recognizing anomalies from a significant distance or not previously encountered in the past. In this paper, we propose a novel framework that leverages the advanced capabilities provided by Large Language Models (LLMs) to actively collect information and perform anomaly detection in novel scenes. To this end, we propose an LLM based model dialogue approach, in which two deep learning models engage in a dialogue to actively control a drone to increase perception and anomaly detection accuracy. We conduct our experiments in a high fidelity simulation environment where an LLM is provided with a predetermined set of natural language movement commands mapped into executable code functions. Additionally, we deploy a multimodal Visual Question Answering (VQA) model charged with the task of visual question answering and captioning. By engaging the two models in conversation, the LLM asks exploratory questions while simultaneously flying a drone into different parts of the scene, providing a novel way to implement active perception. By leveraging LLMs reasoning ability, we output an improved detailed description of the scene going beyond existing static perception approaches. In addition to information gathering, our approach is utilized for anomaly detection and our results demonstrate the proposed methods effectiveness in informing and alerting about potential hazards.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes