FIFA: Fast Inference Approximation for Action Segmentation
This addresses the bottleneck of inference speed for action segmentation tasks, benefiting researchers and practitioners in video analysis, though it is incremental as it builds on existing state-of-the-art approaches.
The paper tackles the problem of slow inference in action segmentation by introducing FIFA, a fast approximate inference method that replaces expensive dynamic programming with gradient-descent optimization, achieving over 5x speed improvement while maintaining performance and setting state-of-the-art results on two datasets.
We introduce FIFA, a fast approximate inference method for action segmentation and alignment. Unlike previous approaches, FIFA does not rely on expensive dynamic programming for inference. Instead, it uses an approximate differentiable energy function that can be minimized using gradient-descent. FIFA is a general approach that can replace exact inference improving its speed by more than 5 times while maintaining its performance. FIFA is an anytime inference algorithm that provides a better speed vs. accuracy trade-off compared to exact inference. We apply FIFA on top of state-of-the-art approaches for weakly supervised action segmentation and alignment as well as fully supervised action segmentation. FIFA achieves state-of-the-art results on most metrics on two action segmentation datasets.