CVMay 19, 2019

U-Net Based Multi-instance Video Object Segmentation

arXiv:1905.07826v14 citations
Originality Incremental advance
AI Analysis

This work addresses video object segmentation for applications requiring high recall, such as surveillance or video editing, but is incremental as it builds on existing methods like OSVOS.

The paper tackles multi-instance video object segmentation by implementing a U-Net-based fully convolutional network with instance isolation, achieving an F mean of 0.467 and J mean of 0.424 on the DAVIS dataset, comparable to state-of-the-art methods while providing smoother contours and better instance coverage.

Multi-instance video object segmentation is to segment specific instances throughout a video sequence in pixel level, given only an annotated first frame. In this paper, we implement an effective fully convolutional networks with U-Net similar structure built on top of OSVOS fine-tuned layer. We use instance isolation to transform this multi-instance segmentation problem into binary labeling problem, and use weighted cross entropy loss and dice coefficient loss as our loss function. Our best model achieves F mean of 0.467 and J mean of 0.424 on DAVIS dataset, which is a comparable performance with the State-of-the-Art approach. But case analysis shows this model can achieve a smoother contour and better instance coverage, meaning it better for recall focused segmentation scenario. We also did experiments on other convolutional neural networks, including Seg-Net, Mask R-CNN, and provide insightful comparison and discussion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes