LGCVHCIVMLSep 7, 2019

Explainable Deep Learning for Video Recognition Tasks: A Framework & Recommendations

arXiv:1909.05667v116 citations
Originality Synthesis-oriented
AI Analysis

This work tackles the problem of interpretability for complex video deep learning models, which is crucial for real-world applications, but it is incremental as it builds on existing explainability research by adapting it to the video domain.

The paper addresses the lack of explainability methods specifically designed for deep learning models in video recognition tasks, noting that current techniques are ill-adapted to the spatio-temporal nature of video data, and it provides a framework and recommendations to fill this gap.

The popularity of Deep Learning for real-world applications is ever-growing. With the introduction of high performance hardware, applications are no longer limited to image recognition. With the introduction of more complex problems comes more and more complex solutions, and the increasing need for explainable AI. Deep Neural Networks for Video tasks are amongst the most complex models, with at least twice the parameters of their Image counterparts. However, explanations for these models are often ill-adapted to the video domain. The current work in explainability for video models is still overshadowed by Image techniques, while Video Deep Learning itself is quickly gaining on methods for still images. This paper seeks to highlight the need for explainability methods designed with video deep learning models, and by association spatio-temporal input in mind, by first illustrating the cutting edge for video deep learning, and then noting the scarcity of research into explanations for these methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes