CVSep 2, 2013

A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification

arXiv:1309.0309v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video-based action classification for computer vision researchers, but it is incremental as it focuses on analyzing existing methods rather than introducing new ones.

The study investigated the effects of separating dictionary learning and feature encoding phases for video-based action classification, finding that sparse coding performs better on complex datasets like HMDB51 and is robust to different dictionaries, while simpler datasets like KTH show competitive performance across all encoding methods.

Many efforts have been devoted to develop alternative methods to traditional vector quantization in image domain such as sparse coding and soft-assignment. These approaches can be split into a dictionary learning phase and a feature encoding phase which are often closely connected. In this paper, we investigate the effects of these phases by separating them for video-based action classification. We compare several dictionary learning methods and feature encoding schemes through extensive experiments on KTH and HMDB51 datasets. Experimental results indicate that sparse coding performs consistently better than the other encoding methods in large complex dataset (i.e., HMDB51), and it is robust to different dictionaries. For small simple dataset (i.e., KTH) with less variation, however, all the encoding strategies perform competitively. In addition, we note that the strength of sophisticated encoding approaches comes not from their corresponding dictionaries but the encoding mechanisms, and we can just use randomly selected exemplars as dictionaries for video-based action classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes