Boundary Uncertainty in a Single-Stage Temporal Action Localization Network
This addresses the problem of accurate action boundary detection in videos for computer vision applications, representing an incremental advance with a novel uncertainty modeling approach.
The paper tackles temporal action localization by modeling boundary predictions as Gaussian distributions to capture uncertainty, achieving over 1.5% improvement in mAP@tIoU=0.5 and performing competitively with more complex networks.
In this paper, we address the problem of temporal action localization with a single stage neural network. In the proposed architecture we model the boundary predictions as uni-variate Gaussian distributions in order to model their uncertainties, which is the first in this area to the best of our knowledge. We use two uncertainty-aware boundary regression losses: first, the Kullback-Leibler divergence between the ground truth location of the boundary and the Gaussian modeling the prediction of the boundary and second, the expectation of the $\ell_1$ loss under the same Gaussian. We show that with both uncertainty modeling approaches improve the detection performance by more than $1.5\%$ in mAP@tIoU=0.5 and that the proposed simple one-stage network performs closely to more complex one and two stage networks.