Prediction of Manipulation Actions
This work addresses the need for active systems with real-time constraints to predict future actions, which is incremental as it builds on existing action recognition methods by focusing on dexterous manipulations.
The paper tackled the problem of predicting dexterous manipulation actions from video, such as squeezing or flipping with a sponge, by developing a recurrent neural network that uses hand patches as input. The result showed that the system closely matches human performance in recognition tasks and can predict both what and how actions are performed.
Looking at a person's hands one often can tell what the person is going to do next, how his/her hands are moving and where they will be, because an actor's intentions shape his/her movement kinematics during action execution. Similarly, active systems with real-time constraints must not simply rely on passive video-segment classification, but they have to continuously update their estimates and predict future actions. In this paper, we study the prediction of dexterous actions. We recorded from subjects performing different manipulation actions on the same object, such as "squeezing", "flipping", "washing", "wiping" and "scratching" with a sponge. In psychophysical experiments, we evaluated human observers' skills in predicting actions from video sequences of different length, depicting the hand movement in the preparation and execution of actions before and after contact with the object. We then developed a recurrent neural network based method for action prediction using as input patches around the hand. We also used the same formalism to predict the forces on the finger tips using for training synchronized video and force data streams. Evaluations on two new datasets showed that our system closely matches human performance in the recognition task, and demonstrate the ability of our algorithm to predict what and how a dexterous action is performed.