CVNov 14, 2014

A Faster Method for Tracking and Scoring Videos Corresponding to Sentences

Haonan Yu, Daniel P. Barrett, Jeffrey Mark Siskind

arXiv:1411.4064v1

Originality Synthesis-oriented

AI Analysis

This incremental improvement benefits researchers and practitioners in video retrieval and description tasks by making existing applications more scalable.

The paper tackles the computational inefficiency of the sentence tracker method for video-sentence alignment by reducing its space and time complexity from exponential to polynomial in sentence length, enabling scaling to larger datasets without quality loss.

Prior work presented the sentence tracker, a method for scoring how well a sentence describes a video clip or alternatively how well a video clip depicts a sentence. We present an improved method for optimizing the same cost function employed by this prior work, reducing the space complexity from exponential in the sentence length to polynomial, as well as producing a qualitatively identical result in time polynomial in the sentence length instead of exponential. Since this new method is plug-compatible with the prior method, it can be used for the same applications: video retrieval with sentential queries, generating sentential descriptions of video clips, and focusing the attention of a tracker with a sentence, while allowing these applications to scale with significantly larger numbers of object detections, word meanings modeled with HMMs with significantly larger numbers of states, and significantly longer sentences, with no appreciable degradation in quality of results.

View on arXiv PDF

Similar