GrabAR: Occlusion-aware Grabbing Virtual Objects in AR
This addresses the challenge of realistic hand-object interactions in AR for users, though it is incremental as it builds on existing AR methods.
The paper tackles the problem of occlusion between real hands and virtual objects in augmented reality by introducing GrabAR, a neural network that predicts occlusion masks without needing depth information, achieving enhanced interaction in AR applications.
Existing augmented reality (AR) applications often ignore occlusion between real hands and virtual objects when incorporating virtual objects in our views. The challenges come from the lack of accurate depth and mismatch between real and virtual depth. This paper presents GrabAR, a new approach that directly predicts the real-and-virtual occlusion, and bypasses the depth acquisition and inference. Our goal is to enhance AR applications with interactions between hand (real) and grabbable objects (virtual). With paired images of hand and object as inputs, we formulate a neural network that learns to generate the occlusion mask. To train the network, we compile a synthetic dataset to pre-train it and a real dataset to fine-tune it, thus reducing the burden of manual labels and addressing the domain difference. Then, we embed the trained network in a prototyping AR system that supports hand grabbing of various virtual objects, demonstrate the system performance, both quantitatively and qualitatively, and showcase interaction scenarios, in which we can use bare hand to grab virtual objects and directly manipulate them.