CVAIMay 4, 2023

HAISTA-NET: Human Assisted Instance Segmentation Through Attention

arXiv:2305.03105v3
Originality Incremental advance
AI Analysis

This addresses the labor-intensive need for precise segmentation in applications like medical imaging and video editing, offering a hybrid approach that is incremental by augmenting existing networks.

The paper tackles the problem of achieving high accuracy in instance segmentation for small and complex objects, which is beyond current automated methods, by proposing HAISTA-NET, a human-assisted model that incorporates human-specified partial boundaries, resulting in performance gains of up to +36.7 points in AP-Mask metrics over state-of-the-art methods.

Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes