The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods
This dataset addresses the problem of improving surgical assistant robots' awareness of surgeon actions, though it is incremental as it builds on existing action detection methods by providing a new benchmark.
The paper introduces ESAD, the first large-scale dataset for surgeon action detection in endoscopic minimally invasive surgery, providing bounding box annotations for 21 action classes on real prostatectomy videos, and analyzes baseline and top-performing models from a related challenge.
For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging. The challenges come from the peculiar structure of the surgical scene, the greater similarity in appearance of actions performed via tools in a cavity compared to, say, human actions in unconstrained environments, as well as from the motion of the endoscopic camera. This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery. ESAD aims at contributing to increase the effectiveness and reliability of surgical assistant robots by realistically testing their awareness of the actions performed by a surgeon. The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge. We also present an analysis of the dataset conducted using the baseline model which was released as part of the challenge, and a description of the top performing models submitted to the challenge together with the results they obtained. This study provides significant insight into what approaches can be effective and can be extended further. We believe that ESAD will serve in the future as a useful benchmark for all researchers active in surgeon action detection and assistive robotics at large.