Action parsing using context features
This work addresses action segmentation in videos, which is important for applications like video analysis and surveillance, but it appears incremental as it builds on existing parsing techniques with context features.
The paper tackled the problem of parsing videos with an unknown number of actions into segments by using context features, such as temporal information from other actions, and achieved improved segmentation accuracy on the Breakfast activity dataset compared to state-of-the-art methods.
We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation is found using a dynamic programming search algorithm that optimizes the overall classification confidence score. The classification score of each segment is determined using local features calculated from that segment as well as context features calculated from other candidate action segments of the sequence. Experimental results on the Breakfast activity data-set showed improved segmentation accuracy compared to existing state-of-the-art parsing techniques.