CVCLSep 26, 2022

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

arXiv:2209.13359v2226 citationsh-index: 42
Originality Incremental advance
AI Analysis

This work addresses the computational expense of integrating PLMs into TVG for video analysis researchers, though it is incremental as it builds on existing adapter techniques.

This paper investigates how pre-trained language models (PLMs) affect Temporal Video Grounding (TVG) performance and tests parameter-efficient NLP adapters as an alternative to full fine-tuning. Results on three datasets show PLMs significantly improve TVG models without visual changes, and adapters deliver comparable results to state-of-the-art models while reducing training costs.

This paper explores the task of Temporal Video Grounding (TVG) where, given an untrimmed video and a natural language sentence query, the goal is to recognize and determine temporal boundaries of action instances in the video described by the query. Recent works tackled this task by improving query inputs with large pre-trained language models (PLM) at the cost of more expensive training. However, the effects of this integration are unclear, as these works also propose improvements in the visual inputs. Therefore, this paper studies the effects of PLMs in TVG and assesses the applicability of parameter-efficient training with NLP adapters. We couple popular PLMs with a selection of existing approaches and test different adapters to reduce the impact of the additional parameters. Our results on three challenging datasets show that, without changing the visual inputs, TVG models greatly benefited from the PLM integration and fine-tuning, stressing the importance of sentence query representation in this task. Furthermore, NLP adapters were an effective alternative to full fine-tuning, even though they were not tailored to our task, allowing PLM integration in larger TVG models and delivering results comparable to SOTA models. Finally, our results shed light on which adapters work best in different scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes