Weakly-Supervised Learning for Tool Localization in Laparoscopic Videos
This work addresses the challenge of limited annotated data for surgical tool localization, which is incremental as it applies an existing weakly-supervised approach to a specific medical domain.
The paper tackles the problem of surgical tool localization in laparoscopic videos by proposing a weakly-supervised deep learning method that uses only image-level annotations, achieving results on the Cholec80 dataset with 5 fully annotated videos for evaluation.
Surgical tool localization is an essential task for the automatic analysis of endoscopic videos. In the literature, existing methods for tool localization, tracking and segmentation require training data that is fully annotated, thereby limiting the size of the datasets that can be used and the generalization of the approaches. In this work, we propose to circumvent the lack of annotated data with weak supervision. We propose a deep architecture, trained solely on image level annotations, that can be used for both tool presence detection and localization in surgical videos. Our architecture relies on a fully convolutional neural network, trained end-to-end, enabling us to localize surgical tools without explicit spatial annotations. We demonstrate the benefits of our approach on a large public dataset, Cholec80, which is fully annotated with binary tool presence information and of which 5 videos have been fully annotated with bounding boxes and tool centers for the evaluation.