AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments
This work addresses the challenge of aerial manipulation for robotics in cluttered settings, representing an incremental improvement by integrating existing components into a complete end-to-end system.
The paper tackles the problem of reliable aerial grasping in cluttered environments by developing an integrated pipeline that combines active exploration, grasp generation, and collision-aware feasibility evaluation, resulting in robust and reliable grasp execution in real-world scenarios.
Reliable aerial grasping in cluttered environments remains challenging due to occlusions and collision risks. Existing aerial manipulation pipelines largely rely on centroid-based grasping and lack integration between the grasp pose generation models, active exploration, and language-level task specification, resulting in the absence of a complete end-to-end system. In this work, we present an integrated pipeline for reliable aerial grasping in cluttered environments. Given a scene and a language instruction, the system identifies the target object and actively explores it to gain better views of the object. During exploration, a grasp generation network predicts multiple 6-DoF grasp candidates for each view. Each candidate is evaluated using a collision-aware feasibility framework, and the overall best grasp is selected and executed using standard trajectory generation and control methods. Experiments in cluttered real-world scenarios demonstrate robust and reliable grasp execution, highlighting the effectiveness of combining active perception with feasibility-aware grasp selection for aerial manipulation.