CVOct 31, 2025
MambaNetLK: Enhancing Colonoscopy Point Cloud Registration with MambaLinzhe Jiang, Jiayuan Huang, Sophia Bano et al.
Accurate 3D point cloud registration underpins reliable image-guided colonoscopy, directly affecting lesion localization, margin assessment, and navigation safety. However, biological tissue exhibits repetitive textures and locally homogeneous geometry that cause feature degeneracy, while substantial domain shifts between pre-operative anatomy and intra-operative observations further degrade alignment stability. To address these clinically critical challenges, we introduce a novel 3D registration method tailored for endoscopic navigation and a high-quality, clinically grounded dataset to support rigorous and reproducible benchmarking. We introduce C3VD-Raycasting-10k, a large-scale benchmark dataset with 10,014 geometrically aligned point cloud pairs derived from clinical CT data. We propose MambaNetLK, a novel correspondence-free registration framework, which enhances the PointNetLK architecture by integrating a Mamba State Space Model (SSM) as a cross-modal feature extractor. As a result, the proposed framework efficiently captures long-range dependencies with linear-time complexity. The alignment is achieved iteratively using the Lucas-Kanade algorithm. On the clinical dataset, C3VD-Raycasting-10k, MambaNetLK achieves the best performance compared with the state-of-the-art methods, reducing median rotation error by 56.04% and RMSE translation error by 26.19% over the second-best method. The model also demonstrates strong generalization on ModelNet40 and superior robustness to initial pose perturbations. MambaNetLK provides a robust foundation for 3D registration in surgical navigation. The combination of a globally expressive SSM-based feature extractor and a large-scale clinical dataset enables more accurate and reliable guidance systems in minimally invasive procedures like colonoscopy.
CVApr 5
NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge ResultsShuhong Liu, Chenyu Bao, Ziteng Cui et al.
This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation.
CVMar 12, 2025
Surgical AI Copilot: Energy-Based Fourier Gradient Low-Rank Adaptation for Surgical LLM Agent Reasoning and PlanningJiayuan Huang, Runlong He, Danyal Zaman Khan et al.
Image-guided surgery demands adaptive, real-time decision support, yet static AI models struggle with structured task planning and providing interactive guidance. Large language models (LLMs)-powered agents offer a promising solution by enabling dynamic task planning and predictive decision support. Despite recent advances, the absence of surgical agent datasets and robust parameter-efficient fine-tuning techniques limits the development of LLM agents capable of complex intraoperative reasoning. In this paper, we introduce Surgical AI Copilot, an LLM agent for image-guided pituitary surgery, capable of conversation, planning, and task execution in response to queries involving tasks such as MRI tumor segmentation, endoscope anatomy segmentation, overlaying preoperative imaging with intraoperative views, instrument tracking, and surgical visual question answering (VQA). To enable structured agent planning, we develop the PitAgent dataset, a surgical context-aware planning dataset covering surgical tasks like workflow analysis, instrument localization, anatomical segmentation, and query-based reasoning. Additionally, we propose DEFT-GaLore, a Deterministic Energy-based Fourier Transform (DEFT) gradient projection technique for efficient low-rank adaptation of recent LLMs (e.g., LLaMA 3.2, Qwen 2.5), enabling their use as surgical agent planners. We extensively validate our agent's performance and the proposed adaptation technique against other state-of-the-art low-rank adaptation methods on agent planning and prompt generation tasks, including a zero-shot surgical VQA benchmark, demonstrating the significant potential for truly efficient and scalable surgical LLM agents in real-time operative settings.