CVSep 29, 2018

FusedLSTM: Fusing frame-level and video-level features for Content-based Video Relevance Prediction

arXiv:1810.00136v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video relevance prediction for content-based retrieval, presenting incremental improvements over existing methods.

The paper tackles the problem of content-based video relevance prediction by proposing two approaches: a FusedLSTM method that combines frame-level and video-level features using LSTM and dense layers with triplet loss, and an Online Kernel Similarity Learning method to learn non-linear similarity measures, with results compared against baseline methods.

This paper describes two of my best performing approaches on the Content-based Video Relevance Prediction challenge. In the FusedLSTM based approach, the inception-pool3 and the C3D-pool5 features are combined using an LSTM and a dense layer to form embeddings with the objective to minimize the triplet loss function. In the second approach, an Online Kernel Similarity Learning method is proposed to learn a non-linear similarity measure to adhere the relevance training data. The last section gives a complete comparison of all the approaches implemented during this challenge, including the one presented in the baseline paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes