CV ROApr 10

Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy

Jiaheng Dai, Huanrong Liu, Tailai Zhou, Tongyu Jia, Qin Liu, Yutong Ban, Zeju Li, Yu Gao, Xin Ma, Qingbiao Li

arXiv:2604.0905118.6h-index: 2

AI Analysis

This work addresses the problem of recognizing suturing gestures in surgical videos for medical professionals, but it is incremental as it benchmarks existing methods on a new dataset.

The paper tackled fine-grained action segmentation for renorrhaphy in robot-assisted partial nephrectomy by evaluating temporal models on the SIA-RAPN benchmark, with DiffAct achieving the highest F1, frame-wise accuracy, edit score, and frame mAP, while MS-TCN++ attained the highest balanced accuracy.

Fine-grained action segmentation during renorrhaphy in robot-assisted partial nephrectomy requires frame-level recognition of visually similar suturing gestures with variable duration and substantial class imbalance. The SIA-RAPN benchmark defines this problem on 50 clinical videos acquired with the da Vinci Xi system and annotated with 12 frame-level labels. The benchmark compares four temporal models built on I3D features: MS-TCN++, AsFormer, TUT, and DiffAct. Evaluation uses balanced accuracy, edit score, segmental F1 at overlap thresholds of 10, 25, and 50, frame-wise accuracy, and frame-wise mean average precision. In addition to the primary evaluation across five released split configurations on SIA-RAPN, the benchmark reports cross-domain results on a separate single-port RAPN dataset. Across the strongest reported values over those five runs on the primary dataset, DiffAct achieves the highest F1, frame-wise accuracy, edit score, and frame mAP, while MS-TCN++ attains the highest balanced accuracy.

View on arXiv PDF

Similar