Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
This work addresses the challenge of agent modeling in computer vision, offering a novel dataset and approach that could benefit robotics and AI, though it is incremental in shifting focus from subtasks to direct agent modeling.
The paper tackles the problem of directly modeling a visually intelligent agent by predicting a dog's actions from ego-centric video data, showing successful modeling under various metrics and that the learned representation generalizes to tasks like walkable surface estimation.
We introduce the task of directly modeling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes visual information as input and directly predicts the actions of the agent. Toward this end we introduce DECADE, a large-scale dataset of ego-centric videos from a dog's perspective as well as her corresponding movements. Using this data we model how the dog acts and how the dog plans her movements. We show under a variety of metrics that given just visual input we can successfully model this intelligent agent in many situations. Moreover, the representation learned by our model encodes distinct information compared to representations trained on image classification, and our learned representation can generalize to other domains. In particular, we show strong results on the task of walkable surface estimation by using this dog modeling task as representation learning.