ROAIMar 11, 2025

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models

arXiv:2503.08302v114 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of hardware-software co-design for general-purpose aerial agents, enabling UAVs to operate in communication-constrained environments, though it appears incremental in integrating existing LLM and robotic components.

This paper tackles the problem of enabling unmanned aerial vehicles (UAVs) to perform open-world tasks by integrating large language models (LLMs) with robotic autonomy, achieving 5-6 tokens/sec inference for 14B-parameter models at 220W peak power. It demonstrates reliable task planning and scene understanding in applications like sugarcane monitoring and power grid inspection.

The emergence of large language models (LLMs) opens new frontiers for unmanned aerial vehicle (UAVs), yet existing systems remain confined to predefined tasks due to hardware-software co-design challenges. This paper presents the first aerial intelligent agent capable of open-world task execution through tight integration of LLM-based reasoning and robotic autonomy. Our hardware-software co-designed system addresses two fundamental limitations: (1) Onboard LLM operation via an edge-optimized computing platform, achieving 5-6 tokens/sec inference for 14B-parameter models at 220W peak power; (2) A bidirectional cognitive architecture that synergizes slow deliberative planning (LLM task planning) with fast reactive control (state estimation, mapping, obstacle avoidance, and motion planning). Validated through preliminary results using our prototype, the system demonstrates reliable task planning and scene understanding in communication-constrained environments, such as sugarcane monitoring, power grid inspection, mine tunnel exploration, and biological observation applications. This work establishes a novel framework for embodied aerial artificial intelligence, bridging the gap between task planning and robotic autonomy in open environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes