Fine-grained Event Learning of Human-Object Interaction with LSTM-CRF
This addresses event learning in AI for applications involving human kinematics and object movements, but it appears incremental as it combines existing techniques like LSTM and CRF.
The paper tackles the problem of learning complex human-object interaction events by proposing a methodology that involves recording, annotating with multiple temporal interpretations, and classifying using LSTM-CRF models, but no concrete results or numbers are provided.
Event learning is one of the most important problems in AI. However, notwithstanding significant research efforts, it is still a very complex task, especially when the events involve the interaction of humans or agents with other objects, as it requires modeling human kinematics and object movements. This study proposes a methodology for learning complex human-object interaction (HOI) events, involving the recording, annotation and classification of event interactions. For annotation, we allow multiple interpretations of a motion capture by slicing over its temporal span, for classification, we use Long-Short Term Memory (LSTM) sequential models with Conditional Randon Field (CRF) for constraints of outputs. Using a setup involving captures of human-object interaction as three dimensional inputs, we argue that this approach could be used for event types involving complex spatio-temporal dynamics.