CVLGMMSep 1, 2018

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset

arXiv:1809.00241v23 citations
AI Analysis

This work addresses the challenge of recognizing diverse human activities in short videos, showing incremental improvement over existing methods.

The researchers tackled activity recognition in 3-second video clips using the Moments in Time dataset, achieving 89.23% Top-5 accuracy across 20 classes with a novel approach combining visual, auditory, and textual features.

Moments capture a huge part of our lives. Accurate recognition of these moments is challenging due to the diverse and complex interpretation of the moments. Action recognition refers to the act of classifying the desired action/activity present in a given video. In this work, we perform experiments on Moments in Time dataset to recognize accurately activities occurring in 3 second clips. We use state of the art techniques for visual, auditory and spatio temporal localization and develop method to accurately classify the activity in the Moments in Time dataset. Our novel approach of using Visual Based Textual features and fusion techniques performs well providing an overall 89.23 % Top - 5 accuracy on the 20 classes - a significant improvement over the Baseline TRN model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes