Learning from Demonstration in the Wild
This work addresses the challenge of leveraging abundant real-world video data for learning from demonstration, which could benefit applications in autonomous systems and traffic analysis, though it is incremental in extending LfD to uncalibrated video sources.
The authors tackled the problem of learning behavior models from unlabeled traffic video footage without manual demonstrations or specialized sensors, and they demonstrated that their ViBe approach can learn purely from such videos, achieving results without additional expert knowledge.
Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.