Large-Scale Mapping of Human Activity using Geo-Tagged Videos
This enables smart-city applications by providing real-time activity recognition from video, but it is incremental as it applies an existing method to new data.
This paper tackles the problem of spatio-temporal mapping of human activity by using geo-tagged YouTube videos and a deep-learning video analysis framework, achieving accurate mapping and demonstrating advantages over using tags or titles.
This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.