Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
This addresses performance bottlenecks for embodied AI systems operating in dynamic real-world environments, representing an incremental optimization of existing methods.
The paper tackles the problem of low inference frequency in embodied AI agents due to sequential computation patterns, presenting Auras, a framework that disaggregates perception and generation modules with pipeline parallelism. Experimental results show Auras improves throughput by 2.54x on average while maintaining 102.7% of original accuracy.
Embodied AI systems operate in dynamic environments, requiring seamless integration of perception and generation modules to process high-frequency input and output demands. Traditional sequential computation patterns, while effective in ensuring accuracy, face significant limitations in achieving the necessary "thinking" frequency for real-world applications. In this work, we present Auras, an algorithm-system co-designed inference framework to optimize the inference frequency of embodied AI agents. Auras disaggregates the perception and generation and provides controlled pipeline parallelism for them to achieve high and stable throughput. Faced with the data staleness problem that appears when the parallelism is increased, Auras establishes a public context for perception and generation to share, thereby promising the accuracy of embodied agents. Experimental results show that Auras improves throughput by 2.54x on average while achieving 102.7% of the original accuracy, demonstrating its efficacy in overcoming the constraints of sequential computation and providing high throughput.