Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems
This addresses the challenge of high costs and adaptability for construction robots, offering a cost-effective solution with potential applications in unstructured environments, though it appears incremental as it builds on existing foundation models.
This study tackled the problem of adapting autonomous construction robots to dynamic tasks by proposing lightweight single and multi-agent AI systems using LLMs and VLMs for task planning. The results showed that a four-agent team outperformed GPT-4o in most metrics while being ten times more cost-effective, with improved generalizability across roles like Painter and Safety Inspector.
Robots are expected to play a major role in the future construction industry but face challenges due to high costs and difficulty adapting to dynamic tasks. This study explores the potential of foundation models to enhance the adaptability and generalizability of task planning in construction robots. Four models are proposed and implemented using lightweight, open-source large language models (LLMs) and vision language models (VLMs). These models include one single agent and three multi-agent teams that collaborate to create robot action plans. The models are evaluated across three construction roles: Painter, Safety Inspector, and Floor Tiling. Results show that the four-agent team outperforms the state-of-the-art GPT-4o in most metrics while being ten times more cost-effective. Additionally, teams with three and four agents demonstrate the improved generalizability. By discussing how agent behaviors influence outputs, this study enhances the understanding of AI teams and supports future research in diverse unstructured environments beyond construction.