Real-World Semantic Grasp Detection Based on Attention Mechanism
This work addresses the challenge of improving grasp detection accuracy and application scope for robotics, representing an incremental advance by integrating semantic recognition with attention mechanisms.
The paper tackles the problem of combining object category recognition with grasp configuration prediction in cluttered scenes using an end-to-end model with a target feature attention mechanism, achieving 98.38% accuracy on the Cornell Grasp Dataset and demonstrating domain adaptability in complex scenarios.
Recognizing the category of the object and using the features of the object itself to predict grasp configuration is of great significance to improve the accuracy of the grasp detection model and expand its application. Researchers have been trying to combine these capabilities in an end-to-end network to grasping specific objects in a cluttered scene efficiently. In this paper, we propose an end-to-end semantic grasp detection model, which can accomplish both semantic recognition and grasp detection. And we also design a target feature attention mechanism to guide the model focus on the features of target object ontology for grasp prediction according to the semantic information. This method effectively reduces the background features that are weakly correlated to the target object, thus making the features more unique and guaranteeing the accuracy and efficiency of grasp detection. Experimental results show that the proposed method can achieve 98.38% accuracy in Cornell Grasp Dataset. Furthermore, our results on complex multi-object scenarios or more rigorous evaluation metrics show the domain adaptability of our method over the state-of-the-art.