End-to-End Pixel-Based Deep Active Inference for Body Perception and Action
This work addresses body perception and action for robots, offering an incremental approach by applying active inference to robotics.
The paper tackled the problem of enabling robots to perform body perception and action using only monocular camera images, resulting in the robot successfully estimating its arm dynamically and autonomously reaching to imagined arm poses.
We present a pixel-based deep active inference algorithm (PixelAI) inspired by human body perception and action. Our algorithm combines the free-energy principle from neuroscience, rooted in variational inference, with deep convolutional decoders to scale the algorithm to directly deal with raw visual input and provide online adaptive inference. Our approach is validated by studying body perception and action in a simulated and a real Nao robot. Results show that our approach allows the robot to perform 1) dynamical body estimation of its arm using only monocular camera images and 2) autonomous reaching to "imagined" arm poses in the visual space. This suggests that robot and human body perception and action can be efficiently solved by viewing both as an active inference problem guided by ongoing sensory input.