Guang-Zhong Yang

CV
h-index63
64papers
1,759citations
Novelty43%
AI Score46

64 Papers

IVMar 10, 2023
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation

Minghui Zhang, Yangqian Wu, Hanxiao Zhang et al. · harvard

Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.

CVOct 8, 2022
Revisiting Self-Supervised Contrastive Learning for Facial Expression Recognition

Yuxuan Shu, Xiao Gu, Guang-Zhong Yang et al. · oxford

The success of most advanced facial expression recognition works relies heavily on large-scale annotated datasets. However, it poses great challenges in acquiring clean and consistent annotations for facial expression datasets. On the other hand, self-supervised contrastive learning has gained great popularity due to its simple yet effective instance discrimination training strategy, which can potentially circumvent the annotation issue. Nevertheless, there remain inherent disadvantages of instance-level discrimination, which are even more challenging when faced with complicated facial representations. In this paper, we revisit the use of self-supervised contrastive learning and explore three core strategies to enforce expression-specific representations and to minimize the interference from other facial attributes, such as identity and face styling. Experimental results show that our proposed method outperforms the current state-of-the-art self-supervised learning methods, in terms of both categorical and dimensional facial expression recognition tasks.

CVJul 20, 2022
Tackling Long-Tailed Category Distribution Under Domain Shifts

Xiao Gu, Yao Guo, Zeju Li et al. · oxford

Machine learning models fail to perform well on real-world applications when 1) the category distribution P(Y) of the training dataset suffers from long-tailed distribution and 2) the test data is drawn from different conditional distributions P(X|Y). Existing approaches cannot handle the scenario where both issues exist, which however is common for real-world applications. In this study, we took a step forward and looked into the problem of long-tailed classification under domain shifts. We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation. Furthermore, we adopted a meta-learning framework which integrates these three blocks to improve domain generalization on unseen target domains. Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS. We evaluated our method on the two datasets and extensive experimental results demonstrate that our proposed method can achieve superior performance over state-of-the-art long-tailed/domain generalization approaches and the combinations. Source codes and datasets can be found at our project page https://xiaogu.site/LTDS.

IVJul 28, 2022
Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction

Hanxiao Zhang, Xiao Gu, Minghui Zhang et al. · oxford

The LIDC-IDRI database is the most popular benchmark for lung cancer prediction. However, with subjective assessment from radiologists, nodules in LIDC may have entirely different malignancy annotations from the pathological ground truth, introducing label assignment errors and subsequent supervision bias during training. The LIDC database thus requires more objective labels for learning-based cancer prediction. Based on an extra small dataset containing 180 nodules diagnosed by pathological examination, we propose to re-label LIDC data to mitigate the effect of original annotation bias verified on this robust benchmark. We demonstrate in this paper that providing new labels by similar nodule retrieval based on metric learning would be an effective re-labeling strategy. Training on these re-labeled LIDC nodules leads to improved model performance, which is enhanced when new labels of uncertain nodules are added. We further infer that re-labeling LIDC is current an expedient way for robust lung cancer prediction while building a large pathological-proven nodule database provides the long-term solution.

CVAug 19, 2024
LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

Reuben Dorent, Roya Khajavi, Tagwa Idris et al.

Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of $61.0\%$. On the other hand, top-ranked teams, with a median Dice score exceeding $70\%$, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.

CVMar 28, 2022
A Long Short-term Memory Based Recurrent Neural Network for Interventional MRI Reconstruction

Ruiyang Zhao, Zhao He, Tao Wang et al.

Interventional magnetic resonance imaging (i-MRI) for surgical guidance could help visualize the interventional process such as deep brain stimulation (DBS), improving the surgery performance and patient outcome. Different from retrospective reconstruction in conventional dynamic imaging, i-MRI for DBS has to acquire and reconstruct the interventional images sequentially online. Here we proposed a convolutional long short-term memory (Conv-LSTM) based recurrent neural network (RNN), or ConvLR, to reconstruct interventional images with golden-angle radial sampling. By using an initializer and Conv-LSTM blocks, the priors from the pre-operative reference image and intra-operative frames were exploited for reconstructing the current frame. Data consistency for radial sampling was implemented by a soft-projection method. To improve the reconstruction accuracy, an adversarial learning strategy was adopted. A set of interventional images based on the pre-operative and post-operative MR images were simulated for algorithm validation. Results showed with only 10 radial spokes, ConvLR provided the best performance compared with state-of-the-art methods, giving an acceleration up to 40 folds. The proposed algorithm has the potential to achieve real-time i-MRI for DBS and can be used for general purpose MR-guided intervention.

CVSep 17, 2022
Differentiable Topology-Preserved Distance Transform for Pulmonary Airway Segmentation

Minghui Zhang, Guang-Zhong Yang, Yun Gu

Detailed pulmonary airway segmentation is a clinically important task for endobronchial intervention and treatment of peripheral located lung cancer lesions. Convolutional Neural Networks (CNNs) are promising tools for medical image analysis but have been performing poorly for cases when existing a significant imbalanced feature distribution, which is true for the airway data as the trachea and principal bronchi dominate most of the voxels whereas the lobar bronchi and distal segmental bronchi occupy a small proportion. In this paper, we propose a Differentiable Topology-Preserved Distance Transform (DTPDT) framework to improve the performance of airway segmentation. A Topology-Preserved Surrogate (TPS) learning strategy is first proposed to balance the training progress within-class distribution. Furthermore, a Convolutional Distance Transform (CDT) is designed to identify the breakage phenomenon with superior sensitivity and minimize the variation of the distance map between the predictionand ground-truth. The proposed method is validated with the publically available reference airway segmentation datasets. The detected rate of branch and length on public EXACT'09 and BAS datasets are 82.1%/79.6% and 96.5%/91.5% respectively, demonstrating the reliability and efficiency of the method in terms of improving the topology completeness of the segmentation performance while maintaining the overall topology accuracy.

65.4CVMay 25
How Far Has AI Come in Liver Fibrosis Staging? A Large-Scale Real-World Dataset and Benchmark

Yuanye Liu, Nannan Shi, Zhejia Zhang et al.

Despite years of methodological progress, how far AI has come in liver fibrosis staging has never been systematically evaluated under the heterogeneous, multi-center conditions that define clinical practice. To address this gap, we introduce LiFS, a large-scale dataset and benchmark derived from the MICCAI 2025 CARE-Liver challenge, comprising 610 patients across multiple centers and scanners with multi-sequence MRI. To the best of our knowledge, LiFS is the first benchmark providing complete gadoxetic acid-enhanced sequences with histopathology-confirmed annotations from diverse real-world scanners. Through systematic evaluation of 9 independently developed methods selected from 96 registered teams against in-cohort radiologist reference results, our findings address how far current AI has progressed toward clinical-level liver fibrosis staging from three complementary perspectives. First, against radiologists, the best AI methods were broadly comparable to the senior radiologist and significantly exceeded the junior radiologist in selected settings, while median AI performance generally approached junior-radiologist levels. Second, from a data perspective, cross-center heterogeneity, label imbalance, and contrast-enhanced sequence variability emerge as the dominant challenges for AI methods. Third, from a technical perspective, methodological design choices, including spatial registration, input dimensionality, multi-modal fusion strategy, and backbone architecture, appear to modulate cross-center robustness, although no single choice alone closes the gap. Overall, LiFS provides a rigorous real-world benchmark for positioning the current state of AI in liver fibrosis staging and for enabling future research on the key challenges that limit clinically reliable deployment.

CVApr 18, 2023
CDFI: Cross Domain Feature Interaction for Robust Bronchi Lumen Detection

Jiasheng Xu, Tianyi Zhang, Yangqian Wu et al.

Endobronchial intervention is increasingly used as a minimally invasive means for the treatment of pulmonary diseases. In order to reduce the difficulty of manipulation in complex airway networks, robust lumen detection is essential for intraoperative guidance. However, these methods are sensitive to visual artifacts which are inevitable during the surgery. In this work, a cross domain feature interaction (CDFI) network is proposed to extract the structural features of lumens, as well as to provide artifact cues to characterize the visual features. To effectively extract the structural and artifact features, the Quadruple Feature Constraints (QFC) module is designed to constrain the intrinsic connections of samples with various imaging-quality. Furthermore, we design a Guided Feature Fusion (GFF) module to supervise the model for adaptive feature fusion based on different types of artifacts. Results show that the features extracted by the proposed method can preserve the structural information of lumen in the presence of large visual variations, bringing much-improved lumen detection accuracy.

CVAug 25, 2022
A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation

Yu Chen, Xu Cao, Xiaoyi Lin et al.

Accurate motion and depth recovery is important for many robot vision tasks including autonomous driving. Most previous studies have achieved cooperative multi-task interaction via either pre-defined loss functions or cross-domain prediction. This paper presents a multi-task scheme that achieves mutual assistance by means of our Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA). F2D and D2F mechanisms enable multi-scale information integration between optical flow and depth domain based on differentiable shallow nets. A dual-head mechanism is used to predict optical flow for rigid and non-rigid motion based on a divide-and-conquer manner, which significantly improves the optical flow estimation performance. Furthermore, to make the prediction more robust and stable, EMA is used for our multi-task training. Experimental results on KITTI datasets show that our multi-task scheme outperforms other multi-task schemes and provide marked improvements on the prediction results.

CVSep 16, 2018Code
3D Path Planning from a Single 2D Fluoroscopic Image for Robot Assisted Fenestrated Endovascular Aortic Repair

Jian-Qing Zheng, Xiao-Yun Zhou, Celia Riga et al.

The current standard of intra-operative navigation during Fenestrated Endovascular Aortic Repair (FEVAR) calls for need of 3D alignments between inserted devices and aortic branches. The navigation commonly via 2D fluoroscopic images, lacks anatomical information, resulting in longer operation hours and radiation exposure. In this paper, a framework for real-time 3D robotic path planning from a single 2D fluoroscopic image of Abdominal Aortic Aneurysm (AAA) is introduced. A graph matching method is proposed to establish the correspondence between the 3D preoperative and 2D intra-operative AAA skeletons, and then the two skeletons are registered by skeleton deformation and regularization in respect to skeleton length and smoothness. Furthermore, deep learning was used to segment 3D pre-operative AAA from Computed Tomography (CT) scans to facilitate the framework automation. Simulation, phantom and patient AAA data sets have been used to validate the proposed framework. 3D distance error of 2mm was achieved in the phantom setup. Performance advantages were also achieved in terms of accuracy, robustness and time-efficiency. All the code will be open source.

CVDec 29, 2023
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

Kaiyuan Yang, Fabio Musio, Yihui Ma et al.

The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited datasets with annotations on CoW anatomy, especially for CTA. Therefore, we organized the TopCoW challenge with the release of an annotated CoW dataset. The TopCoW dataset is the first public dataset with voxel-level annotations for 13 CoW vessel components, enabled by virtual reality technology. It is also the first large dataset using 200 pairs of MRA and CTA from the same patients. As part of the benchmark, we invited submissions worldwide and attracted over 250 registered participants from six continents. The submissions were evaluated on both internal and external test datasets of 226 scans from over five centers. The top performing teams achieved over 90% Dice scores at segmenting the CoW components, over 80% F1 scores at detecting key CoW components, and over 70% balanced accuracy at classifying CoW variants for nearly all test sets. The best algorithms also showed clinical potential in classifying fetal-type posterior cerebral artery and locating aneurysms with CoW anatomy. TopCoW demonstrated the utility and versatility of CoW segmentation algorithms for a wide range of downstream clinical applications with explainability. The annotated datasets and best performing algorithms have been released as public Zenodo records to foster further methodological development and clinical tool building.

LGDec 3, 2024
Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing

Nanyang Ye, Qiao Sun, Yifei Wang et al.

Analog computing using non-volatile memristors has emerged as a promising solution for energy-efficient deep learning. New materials, like perovskites-based memristors are recently attractive due to their cost-effectiveness, energy efficiency and flexibility. Yet, challenges in material diversity and immature fabrications require extensive experimentation for device development. Moreover, significant non-idealities in these memristors often impede them for computing. Here, we propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs that effectively address the inherent non-idealities of these memristors. Employing Bayesian optimization (BO) with a focus on usability, we efficiently identify optimal materials and fabrication conditions for perovskite memristors. Meanwhile, we developed "BayesMulti", a DNN training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections. Our approach theoretically ensures that within a certain range of parameter perturbations due to memristor non-idealities, the prediction outcomes remain consistent. Our integrated approach enables use of analog computing in much deeper and wider networks, which significantly outperforms existing methods in diverse tasks like image classification, autonomous driving, species identification, and large vision-language models, achieving up to 100-fold improvements. We further validate our methodology on a 10$\times$10 optimized perovskite memristor crossbar, demonstrating high accuracy in a classification task and low energy consumption. This study offers a versatile solution for efficient optimization of various analog computing systems, encompassing both devices and algorithms.

IVDec 15, 2024
AirMorph: Topology-Preserving Deep Learning for Pulmonary Airway Analysis

Minghui Zhang, Chenyu Li, Fangfang Xie et al.

Accurate anatomical labeling and analysis of the pulmonary structure and its surrounding anatomy from thoracic CT is getting increasingly important for understanding the etilogy of abnormalities or supporting targetted therapy and early interventions. Whilst lung and airway cell atlases have been attempted, there is a lack of fine-grained morphological atlases that are clinically deployable. In this work, we introduce AirMorph, a robust, end-to-end deep learning pipeline enabling fully automatic and comprehensive airway anatomical labeling at lobar, segmental, and subsegmental resolutions that can be used to create digital atlases of the lung. Evaluated across large-scale multi-center datasets comprising diverse pulmonary conditions, the AirMorph consistently outperformed existing segmentation and labeling methods in terms of accuracy, topological consistency, and completeness. To simplify clinical interpretation, we further introduce a compact anatomical signature quantifying critical morphological airway features, including stenosis, ectasia, tortuosity, divergence, length, and complexity. When applied to various pulmonary diseases such as pulmonary fibrosis, emphysema, atelectasis, consolidation, and reticular opacities, it demonstrates strong discriminative power, revealing disease-specific morphological patterns with high interpretability and explainability. Additionally, AirMorph supports efficient automated branching pattern analysis, potentially enhancing bronchoscopic navigation planning and procedural safety, offering a valuable clinical tool for improved diagnosis, targeted treatment, and personalized patient care.

CVMar 16, 2024
Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images

Junyang Wu, Yun Gu, Guang-Zhong Yang

Robot assisted endoluminal intervention is an emerging technique for both benign and malignant luminal lesions. With vision-based navigation, when combined with pre-operative imaging data as priors, it is possible to recover position and pose of the endoscope without the need of additional sensors. In practice, however, aligning pre-operative and intra-operative domains is complicated by significant texture differences. Although methods such as style transfer can be used to address this issue, they require large datasets from both source and target domains with prolonged training times. This paper proposes an efficient domain transfer method based on stylized Gaussian splatting, only requiring a few of real images (10 images) with very fast training time. Specifically, the transfer process includes two phases. In the first phase, the 3D models reconstructed from CT scans are represented as differential Gaussian point clouds. In the second phase, only color appearance related parameters are optimized to transfer the style and preserve the visual content. A novel structure consistency loss is applied to latent features and depth levels to enhance the stability of the transferred images. Detailed validation was performed to demonstrate the performance advantages of the proposed method compared to that of the current state-of-the-art, highlighting the potential for intra-operative surgical navigation.

ROMar 9
Long-Short Term Agents for Pure-Vision Bronchoscopy Robotic Autonomy

Junyang Wu, Mingyi Luo, Fangfang Xie et al.

Accurate intraoperative navigation is essential for robot-assisted endoluminal intervention, but remains difficult because of limited endoscopic field of view and dynamic artifacts. Existing navigation platforms often rely on external localization technologies, such as electromagnetic tracking or shape sensing, which increase hardware complexity and remain vulnerable to intraoperative anatomical mismatch. We present a vision-only autonomy framework that performs long-horizon bronchoscopic navigation using preoperative CT-derived virtual targets and live endoscopic video, without external tracking during navigation. The framework uses hierarchical long-short agents: a short-term reactive agent for continuous low-latency motion control, and a long-term strategic agent for decision support at anatomically ambiguous points. When their recommendations conflict, a world-model critic predicts future visual states for candidate actions and selects the action whose predicted state best matches the target view. We evaluated the system in a high-fidelity airway phantom, three ex vivo porcine lungs, and a live porcine model. The system reached all planned segmental targets in the phantom, maintained 80\% success to the eighth generation ex vivo, and achieved in vivo navigation performance comparable to the expert bronchoscopist. These results support the preclinical feasibility of sensor-free autonomous bronchoscopic navigation.

IVOct 15, 2024
From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images

Junyang Wu, Fangfang Xie, Jiayuan Sun et al.

Domain adaptation, which bridges the distributions across different modalities, plays a crucial role in multimodal medical image analysis. In endoscopic imaging, combining pre-operative data with intra-operative imaging is important for surgical planning and navigation. However, existing domain adaptation methods are hampered by distribution shift caused by in vivo artifacts, necessitating robust techniques for aligning noisy and artifact abundant patient endoscopic videos with clean virtual images reconstructed from pre-operative tomographic data for pose estimation during intraoperative guidance. This paper presents an artifact-resilient image translation method and an associated benchmark for this purpose. The method incorporates a novel ``local-global'' translation framework and a noise-resilient feature extraction strategy. For the former, it decouples the image translation process into a local step for feature denoising, and a global step for global style transfer. For feature extraction, a new contrastive learning strategy is proposed, which can extract noise-resilient features for establishing robust correspondence across domains. Detailed validation on both public and in-house clinical datasets has been conducted, demonstrating significantly improved performance compared to the current state-of-the-art.

IVFeb 25, 2022
Faithful learning with sure data for lung nodule diagnosis

Hanxiao Zhang, Liang Chen, Xiao Gu et al.

Recent evolution in deep learning has proven its value for CT-based lung nodule classification. Most current techniques are intrinsically black-box systems, suffering from two generalizability issues in clinical practice. First, benign-malignant discrimination is often assessed by human observers without pathologic diagnoses at the nodule level. We termed these data as "unsure data". Second, a classifier does not necessarily acquire reliable nodule features for stable learning and robust prediction with patch-level labels during learning. In this study, we construct a sure dataset with pathologically-confirmed labels and propose a collaborative learning framework to facilitate sure nodule classification by integrating unsure data knowledge through nodule segmentation and malignancy score regression. A loss function is designed to learn reliable features by introducing interpretability constraints regulated with nodule segmentation maps. Furthermore, based on model inference results that reflect the understanding from both machine and experts, we explore a new nodule analysis method for similar historical nodule retrieval and interpretable diagnosis. Detailed experimental results demonstrate that our approach is beneficial for achieving improved performance coupled with faithful model reasoning for lung cancer prediction. Extensive cross-evaluation results further illustrate the effect of unsure data for deep-learning-based methods in lung nodule classification.

CVSep 3, 2021
Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based Cross-View Gait Pose Estimation

Xiao Gu, Jianxin Yang, Hanxiao Zhang et al.

Accurate estimation of three-dimensional human skeletons from depth images can provide important metrics for healthcare applications, especially for biomechanical gait analysis. However, there exist inherent problems associated with depth images captured from a single view. The collected data is greatly affected by occlusions where only partial surface data can be recorded. Furthermore, depth images of human body exhibit heterogeneous characteristics with viewpoint changes, and the estimated poses under local coordinate systems are expected to go through equivariant rotations. Most existing pose estimation models are sensitive to both issues. To address this, we propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework built upon a novel rotation-equivariant backbone. Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views. It can generalize well on the real-world data from all the other unseen views. Our approach has shown superior performance on gait analysis on our ICL-Gait dataset compared to other state-of-the-arts and it can produce more convincing keypoints on ITOP dataset, than its provided "ground truth".

CVJul 28, 2021
TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

Xiao Gu, Jianing Qiu, Yao Guo et al.

In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given. We developed a hierarchical attention model for action anticipation, which leverages Transformer-based attention mechanism to aggregate features across temporal dimension, modalities, symbiotic branches respectively. In terms of Mean Top-5 Recall of action, our submission with team name ICL-SJTU achieved 13.39% for overall testing set, 10.05% for unseen subsets and 11.88% for tailed subsets. Additionally, it is noteworthy that our submission ranked 1st in terms of verb class in all three (sub)sets.

ROJun 7, 2021
Robotic Electrospinning Actuated by Non-Circular Joint Continuum Manipulator for Endoluminal Therapy

Zicong Wu, Chuqian Lou, Zhu Jin et al.

Electrospinning has exhibited excellent benefits to treat the trauma for tissue engineering due to its produced micro/nano fibrous structure. It can effectively adhere to the tissue surface for long-term continuous therapy. This paper develops a robotic electrospinning platform for endoluminal therapy. The platform consists of a continuum manipulator, the electrospinning device, and the actuation unit. The continuum manipulator has two bending sections to facilitate the steering of the tip needle for a controllable spinning direction. Non-circular joint profile is carefully designed to enable a constant length of the centreline of a continuum manipulator for stable fluid transmission inside it. Experiments are performed on a bronchus phantom, and the steering ability and bending limitation in each direction are also investigated. The endoluminal electrospinning is also fulfilled by a trajectory following and points targeting experiments. The effective adhesive area of the produced fibre is also illustrated. The proposed robotic electrospinning shows its feasibility to precisely spread more therapeutic drug to construct fibrous structure for potential endoluminal treatment.

IVDec 10, 2020
Learning Tubule-Sensitive CNNs for Pulmonary Airway and Artery-Vein Segmentation in CT

Yulei Qin, Hao Zheng, Yun Gu et al.

Training convolutional neural networks (CNNs) for segmentation of pulmonary airway, artery, and vein is challenging due to sparse supervisory signals caused by the severe class imbalance between tubular targets and background. We present a CNNs-based method for accurate airway and artery-vein segmentation in non-contrast computed tomography. It enjoys superior sensitivity to tenuous peripheral bronchioles, arterioles, and venules. The method first uses a feature recalibration module to make the best use of features learned from the neural networks. Spatial information of features is properly integrated to retain relative priority of activated regions, which benefits the subsequent channel-wise recalibration. Then, attention distillation module is introduced to reinforce representation learning of tubular objects. Fine-grained details in high-resolution attention maps are passing down from one layer to its previous layer recursively to enrich context. Anatomy prior of lung context map and distance transform map is designed and incorporated for better artery-vein differentiation capacity. Extensive experiments demonstrated considerable performance gains brought by these components. Compared with state-of-the-art methods, our method extracted much more branches while maintaining competitive overall segmentation performance. Codes and models are available at http://www.pami.sjtu.edu.cn/News/56

LGDec 4, 2020
Batch Group Normalization

Xiao-Yun Zhou, Jiacheng Sun, Nanyang Ye et al.

Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to train. Normalization is one of the effective solutions. Among previous normalization methods, Batch Normalization (BN) performs well at medium and large batch sizes and is with good generalizability to multiple vision tasks, while its performance degrades significantly at small batch sizes. In this paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images per worker, i.e., GPU, as well and propose that the degradation/saturation of BN at small/extreme large batch sizes is caused by noisy/confused statistic calculation. Hence without adding new trainable parameters, using multiple-layer or multi-iteration information, or introducing extra computation, Batch Group Normalization (BGN) is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes with introducing the channel, height and width dimension to compensate. The group technique in Group Normalization (GN) is used and a hyper-parameter G is used to control the number of feature instances used for statistic calculation, hence to offer neither noisy nor confused statistic for different batch sizes. We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and wide generalizability. For example, for training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1 accuracy of 66.512% while BGN achieves 76.096% with notable improvement.

IVNov 24, 2020
Alleviating Class-wise Gradient Imbalance for Pulmonary Airway Segmentation

Hao Zheng, Yulei Qin, Yun Gu et al.

Automated airway segmentation is a prerequisite for pre-operative diagnosis and intra-operative navigation for pulmonary intervention. Due to the small size and scattered spatial distribution of peripheral bronchi, this is hampered by severe class imbalance between foreground and background regions, which makes it challenging for CNN-based methods to parse distal small airways. In this paper, we demonstrate that this problem is arisen by gradient erosion and dilation of the neighborhood voxels. During back-propagation, if the ratio of the foreground gradient to background gradient is small while the class imbalance is local, the foreground gradients can be eroded by their neighborhoods. This process cumulatively increases the noise information included in the gradient flow from top layers to the bottom ones, limiting the learning of small structures in CNNs. To alleviate this problem, we use group supervision and the corresponding WingsNet to provide complementary gradient flows to enhance the training of shallow layers. To further address the intra-class imbalance between large and small airways, we design a General Union loss function which obviates the impact of airway size by distance-based weights and adaptively tunes the gradient ratio based on the learning process. Extensive experiments on public datasets demonstrate that the proposed method can predict the airway structures with higher accuracy and better morphological completeness than the baselines.

IVJun 16, 2020
End-to-End Real-time Catheter Segmentation with Optical Flow-Guided Warping during Endovascular Intervention

Anh Nguyen, Dennis Kundrat, Giulio Dagnino et al.

Accurate real-time catheter segmentation is an important pre-requisite for robot-assisted endovascular intervention. Most of the existing learning-based methods for catheter segmentation and tracking are only trained on small-scale datasets or synthetic data due to the difficulties of ground-truth annotation. Furthermore, the temporal continuity in intraoperative imaging sequences is not fully utilised. In this paper, we present FW-Net, an end-to-end and real-time deep learning framework for endovascular intervention. The proposed FW-Net has three modules: a segmentation network with encoder-decoder architecture, a flow network to extract optical flow information, and a novel flow-guided warping function to learn the frame-to-frame temporal continuity. We show that by effectively learning temporal continuity, the network can successfully segment and track the catheters in real-time sequences using only raw ground-truth for training. Detailed validation results confirm that our FW-Net outperforms state-of-the-art techniques while achieving real-time performance.

APP-PHJun 11, 2020
FBG-Based Triaxial Force Sensor Integrated with an Eccentrically Configured Imaging Probe for Endoluminal Optical Biopsy

Zicong Wu, Anzhu Gao, Ning Liu et al.

Accurate force sensing is important for endoluminal intervention in terms of both safety and lesion targeting. This paper develops an FBG-based force sensor for robotic bronchoscopy by configuring three FBG sensors at the lateral side of a conical substrate. It allows a large and eccentric inner lumen for the interventional instrument, enabling a flexible imaging probe inside to perform optical biopsy. The force sensor is embodied with a laser-profiled continuum robot and thermo drift is fully compensated by three temperature sensors integrated on the circumference surface of the sensor substrate. Different decoupling approaches are investigated, and nonlinear decoupling is adopted based on the cross-validation SVM and a Gaussian kernel function, achieving an accuracy of 10.58 mN, 14.57 mN and 26.32 mN along X, Y and Z axis, respectively. The tissue test is also investigated to further demonstrate the feasibility of the developed triaxial force sensor

ROJun 4, 2020
Hybrid Data-Driven and Analytical Model for Kinematic Control of a Surgical Robotic Tool

Francesco Cursi, Anh Nguyen, Guang-Zhong Yang

Accurate kinematic models are essential for effective control of surgical robots. For tendon driven robots, which is common for minimally invasive surgery, intrinsic nonlinearities are important to consider. Traditional analytical methods allow to build the kinematic model of the system by making certain assumptions and simplifications on the nonlinearities. Machine learning techniques, instead, allow to recover a more complex model based on the acquired data. However, analytical models are more generalisable, but can be over-simplified; data-driven models, on the other hand, can cater for more complex models, but are less generalisable and the result is highly affected by the training dataset. In this paper, we present a novel approach to combining analytical and data-driven approaches to model the kinematics of nonlinear tendon-driven surgical robots. Gaussian Process Regression (GPR) is used for learning the data-driven model and the proposed method is tested on both simulated data and real experimental data.

ROApr 1, 2020
Constrained-Space Optimization and Reinforcement Learning for Complex Tasks

Ya-Yen Tsai, Bo Xiao, Edward Johns et al.

Learning from Demonstration is increasingly used for transferring operator manipulation skills to robots. In practice, it is important to cater for limited data and imperfect human demonstrations, as well as underlying safety constraints. This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks. Through interactions within the constrained space, the reinforcement learning agent is trained to optimize the manipulation skills according to a defined reward function. After learning, the optimal policy is derived from the well-trained reinforcement learning agent, which is then implemented to guide the robot to conduct tasks that are similar to the experts' demonstrations. The effectiveness of the proposed method is verified with a robotic suturing task, demonstrating that the learned policy outperformed the experts' demonstrations in terms of the smoothness of the joint motion and end-effector trajectories, as well as the overall task completion time.

IVMar 3, 2020
DDU-Nets: Distributed Dense Model for 3D MRI Brain Tumor Segmentation

Hanxiao Zhang, Jingxiong Li, Mali Shen et al.

Segmentation of brain tumors and their subregions remains a challenging task due to their weak features and deformable shapes. In this paper, three patterns (cross-skip, skip-1 and skip-2) of distributed dense connections (DDCs) are proposed to enhance feature reuse and propagation of CNNs by constructing tunnels between key layers of the network. For better detecting and segmenting brain tumors from multi-modal 3D MR images, CNN-based models embedded with DDCs (DDU-Nets) are trained efficiently from pixel to pixel with a limited number of parameters. Postprocessing is then applied to refine the segmentation results by reducing the false-positive samples. The proposed method is evaluated on the BraTS 2019 dataset with results demonstrating the effectiveness of the DDU-Nets while requiring less computational cost.

HCMar 1, 2020
MIndGrasp: A New Training and Testing Framework for Motor Imagery Based 3-Dimensional Assistive Robotic Control

Daniel Freer, Guang-Zhong Yang

With increasing global age and disability assistive robots are becoming more necessary, and brain computer interfaces (BCI) are often proposed as a solution to understanding the intent of a disabled person that needs assistance. Most frameworks for electroencephalography (EEG)-based motor imagery (MI) BCI control rely on the direct control of the robot in Cartesian space. However, for 3-dimensional movement, this requires 6 motor imagery classes, which is a difficult distinction even for more experienced BCI users. In this paper, we present a simulated training and testing framework which reduces the number of motor imagery classes to 4 while still grasping objects in three-dimensional space. This is achieved through semi-autonomous eye-in-hand vision-based control of the robotic arm, while the user-controlled BCI achieves movement to the left and right, as well as movement toward and away from the object of interest. Additionally, the framework includes a method of training a BCI directly on the assistive robotic system, which should be more easily transferrable to a real-world assistive robot than using a standard training protocol such as Graz-BCI. Presented results do not consider real human EEG data, but are rather shown as a baseline for comparison with future human data and other improvements on the system.

HCFeb 24, 2020
On-Orbit Operations Simulator for Workload Measurement during Telerobotic Training

Daniel Freer, Yao Guo, Fani Deligianni et al.

Training for telerobotic systems often makes heavy use of simulated platforms, which ensure safe operation during the learning process. Outer space is one domain in which such a simulated training platform would be useful, as On-Orbit Operations (O3) can be costly, inefficient, or even dangerous if not performed properly. In this paper, we present a new telerobotic training simulator for the Canadarm2 on the International Space Station (ISS), which is able to modulate workload through the addition of confounding factors such as latency, obstacles, and time pressure. In addition, multimodal physiological data is collected from subjects as they perform a task from the simulator under these different conditions. As most current workload measures are subjective, we analyse objective measures from the simulator and EEG data that can provide a reliable measure. ANOVA of task data revealed which simulator-based performance measures could predict the presence of latency and time pressure. Furthermore, EEG classification using a Riemannian classifier and Leave-One-Subject-Out cross-validation showed promising classification performance and allowed for comparison of different channel configurations and preprocessing methods. Additionally, Riemannian distance and beta power of EEG data were investigated as potential cross-trial and continuous workload measures.

ROFeb 21, 2020
Nonlinearity Compensation in a Multi-DoF Shoulder Sensing Exosuit for Real-Time Teleoperation

Rejin John Varghese, Anh Nguyen, Etienne Burdet et al.

The compliant nature of soft wearable robots makes them ideal for complex multiple degrees of freedom (DoF) joints, but also introduce additional structural nonlinearities. Intuitive control of these wearable robots requires robust sensing to overcome the inherent nonlinearities. This paper presents a joint kinematics estimator for a bio-inspired multi-DoF shoulder exosuit capable of compensating the encountered nonlinearities. To overcome the nonlinearities and hysteresis inherent to the soft and compliant nature of the suit, we developed a deep learning-based method to map the sensor data to the joint space. The experimental results show that the new learning-based framework outperforms recent state-of-the-art methods by a large margin while achieving 12ms inference time using only a GPU-based edge-computing device. The effectiveness of our combined exosuit and learning framework is demonstrated through real-time teleoperation with a simulated NAO humanoid robot.

MED-PHDec 23, 2019
Artificial Intelligence in Surgery

Xiao-Yun Zhou, Yao Guo, Mali Shen et al.

Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.

ROOct 10, 2019
Design and Prototyping of a Bio-inspired Kinematic Sensing Suit for the Shoulder Joint: Precursor to a Multi-DoF Shoulder Exosuit

Rejin John Varghese, Benny P L Lo, Guang-Zhong Yang

Soft wearable robots are a promising new design paradigm for rehabilitation and active assistance applications. Their compliant nature makes them ideal for complex joints like the shoulder, but intuitive control of these robots require robust and compliant sensing mechanisms. In this work, we introduce the sensing framework for a multi-DoF shoulder exosuit capable of sensing the kinematics of the shoulder joint. The proposed tendon-based sensing system is inspired by the concept of muscle synergies, the body's sense of proprioception, and finds its basis in the organization of the muscles responsible for shoulder movements. A motion-capture-based evaluation of the developed sensing system showed conformance to the behaviour exhibited by the muscles that inspired its routing and validates the hypothesis of the tendon-routing to be extended to the actuation framework of the exosuit in the future. The mapping from multi-sensor space to joint space is a multivariate multiple regression problem and was derived using an Artificial Neural Network (ANN). The sensing framework was tested with a motion-tracking system and achieved performance with root mean square error (RMSE) of approximately 5.43 degrees and 3.65 degrees for the azimuth and elevation joint angles, respectively, measured over 29000 frames (4+ minutes) of motion-capture data.

IVSep 16, 2019
Instantiation-Net: 3D Mesh Reconstruction from Single 2D Image for Right Ventricle

Zhao-Yang Wang, Xiao-Yun Zhou, Peichao Li et al.

3D shape instantiation which reconstructs the 3D shape of a target from limited 2D images or projections is an emerging technique for surgical intervention. It improves the currently less-informative and insufficient 2D navigation schemes for robot-assisted Minimally Invasive Surgery (MIS) to 3D navigation. Previously, a general and registration-free framework was proposed for 3D shape instantiation based on Kernel Partial Least Square Regression (KPLSR), requiring manually segmented anatomical structures as the pre-requisite. Two hyper-parameters including the Gaussian width and component number also need to be carefully adjusted. Deep Convolutional Neural Network (DCNN) based framework has also been proposed to reconstruct a 3D point cloud from a single 2D image, with end-to-end and fully automatic learning. In this paper, an Instantiation-Net is proposed to reconstruct the 3D mesh of a target from its a single 2D image, by using DCNN to extract features from the 2D image and Graph Convolutional Network (GCN) to reconstruct the 3D mesh, and using Fully Connected (FC) layers to connect the DCNN to GCN. Detailed validation was performed to demonstrate the practical strength of the method and its potential clinical use.

IVSep 16, 2019
Z-Net: an Anisotropic 3D DCNN for Medical CT Volume Segmentation

Peichao Li, Xiao-Yun Zhou, Zhao-Yang Wang et al.

Accurate volume segmentation from the Computed Tomography (CT) scan is a common prerequisite for pre-operative planning, intra-operative guidance and quantitative assessment of therapeutic outcomes in robot-assisted Minimally Invasive Surgery (MIS). 3D Deep Convolutional Neural Network (DCNN) is a viable solution for this task, but is memory intensive. Small isotropic patches are cropped from the original and large CT volume to mitigate this issue in practice, but it may cause discontinuities between the adjacent patches and severe class-imbalances within individual sub-volumes. This paper presents a new 3D DCNN framework, namely Z-Net, to tackle the discontinuity and class-imbalance issue by preserving a full field-of-view of the objects in the XY planes using anisotropic spatial separable convolutions. The proposed Z-Net can be seamlessly integrated into existing 3D DCNNs with isotropic convolutions such as 3D U-Net and V-Net, with improved volume segmentation Intersection over Union (IoU) - up to $12.6\%$. Detailed validation of Z-Net is provided for CT aortic, liver and lung segmentation, demonstrating the effectiveness and practical value of Z-Net for intra-operative 3D navigation in robot-assisted MIS.

ROSep 15, 2019
Hybrid Robot-assisted Frameworks for Endomicroscopy Scanning in Retinal Surgeries

Zhaoshuo Li, Mahya Shahbazi, Niravkumar Patel et al.

High-resolution real-time intraocular imaging of retina at the cellular level is very challenging due to the vulnerable and confined space within the eyeball as well as the limited availability of appropriate modalities. A probe-based confocal laser endomicroscopy (pCLE) system, can be a potential imaging modality for improved diagnosis. The ability to visualize the retina at the cellular level could provide information that may predict surgical outcomes. The adoption of intraocular pCLE scanning is currently limited due to the narrow field of view and the micron-scale range of focus. In the absence of motion compensation, physiological tremors of the surgeons' hand and patient movements also contribute to the deterioration of the image quality. Therefore, an image-based hybrid control strategy is proposed to mitigate the above challenges. The proposed hybrid control strategy enables a shared control of the pCLE probe between surgeons and robots to scan the retina precisely, with the absence of hand tremors and with the advantages of an image-based auto-focus algorithm that optimizes the quality of pCLE images. The hybrid control strategy is deployed on two frameworks - cooperative and teleoperated. Better image quality, smoother motion, and reduced workload are all achieved in a statistically significant manner with the hybrid control frameworks.

ROAug 23, 2019
A Robust Regression Approach for Robot Model Learning

Francesco Cursi, Guang-Zhong Yang

Machine learning and data analysis have been used in many robotics fields, especially for modelling. Data are usually the result of sensor measurements and, as such, they might be subjected to noise and outliers. The presence of outliers has a huge impact on modelling the acquired data, resulting in inappropriate models. In this work a novel approach for outlier detection and rejection for input/output mapping in regression problems is presented. The robustness of the method is shown both through simulated data for linear and nonlinear regression, and real sensory data. Despite being validated by using artificial neural networks, the method can be generalized to any other regression method

IVAug 21, 2019
U-Net Training with Instance-Layer Normalization

Xiao-Yun Zhou, Peichao Li, Zhao-Yang Wang et al.

Normalization layers are essential in a Deep Convolutional Neural Network (DCNN). Various normalization methods have been proposed. The statistics used to normalize the feature maps can be computed at batch, channel, or instance level. However, in most of existing methods, the normalization for each layer is fixed. Batch-Instance Normalization (BIN) is one of the first proposed methods that combines two different normalization methods and achieve diverse normalization for different layers. However, two potential issues exist in BIN: first, the Clip function is not differentiable at input values of 0 and 1; second, the combined feature map is not with a normalized distribution which is harmful for signal propagation in DCNN. In this paper, an Instance-Layer Normalization (ILN) layer is proposed by using the Sigmoid function for the feature map combination, and cascading group normalization. The performance of ILN is validated on image segmentation of the Right Ventricle (RV) and Left Ventricle (LV) using U-Net as the network architecture. The results show that the proposed ILN outperforms previous traditional and popular normalization methods with noticeable accuracy improvements for most validations, supporting the effectiveness of the proposed ILN.

CVJul 24, 2019
One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud

Xiao-Yun Zhou, Zhao-Yang Wang, Peichao Li et al.

Shape instantiation which predicts the 3D shape of a dynamic target from one or more 2D images is important for real-time intra-operative navigation. Previously, a general shape instantiation framework was proposed with manual image segmentation to generate a 2D Statistical Shape Model (SSM) and with Kernel Partial Least Square Regression (KPLSR) to learn the relationship between the 2D and 3D SSM for 3D shape prediction. In this paper, the two-stage shape instantiation is improved to be one-stage. PointOutNet with 19 convolutional layers and three fully-connected layers is used as the network structure and Chamfer distance is used as the loss function to predict the 3D target point cloud from a single 2D image. With the proposed one-stage shape instantiation algorithm, a spontaneous image-to-point cloud training and inference can be achieved. A dataset from 27 Right Ventricle (RV) subjects, indicating 609 experiments, were used to validate the proposed one-stage shape instantiation algorithm. An average point cloud-to-point cloud (PC-to-PC) error of 1.72mm has been achieved, which is comparable to the PLSR-based (1.42mm) and KPLSR-based (1.31mm) two-stage shape instantiation algorithm.

IVJul 16, 2019
AirwayNet: A Voxel-Connectivity Aware Approach for Accurate Airway Segmentation Using Convolutional Neural Networks

Yulei Qin, Mingjian Chen, Hao Zheng et al.

Airway segmentation on CT scans is critical for pulmonary disease diagnosis and endobronchial navigation. Manual extraction of airway requires strenuous efforts due to the complicated structure and various appearance of airway. For automatic airway extraction, convolutional neural networks (CNNs) based methods have recently become the state-of-the-art approach. However, there still remains a challenge for CNNs to perceive the tree-like pattern and comprehend the connectivity of airway. To address this, we propose a voxel-connectivity aware approach named AirwayNet for accurate airway segmentation. By connectivity modeling, conventional binary segmentation task is transformed into 26 tasks of connectivity prediction. Thus, our AirwayNet learns both airway structure and relationship between neighboring voxels. To take advantage of context knowledge, lung distance map and voxel coordinates are fed into AirwayNet as additional semantic information. Compared to existing approaches, AirwayNet achieved superior performance, demonstrating the effectiveness of the network's awareness of voxel connectivity.

HCJun 25, 2019
Intention Detection of Gait Adaptation in Natural Settings

Ines Domingos, Guang-Zhong Yang, Fani Deligianni

Gait adaptation is an important part of gait analysis and its neuronal origin and dynamics has been studied extensively. In neurorehabilitation, it is important as it perturbs neuronal dynamics and allows patients to restore some of their motor function. Exoskeletons and robotics of the lower limbs are increasingly used to facilitate rehabilitation as well as supporting daily function. Their efficiency and safety depends on how well can sense the human intention to move and adapt the gait accordingly. This paper presents a gait adaptation scheme in natural settings. It allows monitoring of subjects in more realistic environment without the requirement of specialized equipment such as treadmill and foot pressure sensors. We extract gait characteristics based on a single RBG camera whereas wireless EEG signals are monitored simultaneously. We demonstrate that the method can not only successfully detect adaptation steps but also detect efficiently whether the subject adjust their pace to higher or lower speed.

CVFeb 28, 2019
Real-time 3D Shape Instantiation for Partially-deployed Stent Segment from a Single 2D Fluoroscopic Image in Robot-assisted Fenestrated Endovascular Aortic Repair

Jian-Qing Zheng, Xiao-Yun Zhou, Guang-Zhong Yang

In robot-assisted Fenestrated Endovascular Aortic Repair (FEVAR), accurate alignment of stent graft fenestrations or scallops with aortic branches is essential for establishing complete blood flow perfusion. Current navigation is largely based on 2D fluoroscopic images, which lacks 3D anatomical information, thus causing longer operation time as well as high risks of radiation exposure. Previously, 3D shape instantiation frameworks for real-time 3D shape reconstruction of fully-deployed or fully-compressed stent graft from a single 2D fluoroscopic image have been proposed for 3D navigation in robot-assisted FEVAR. However, these methods could not instantiate partially-deployed stent segments, as the 3D marker references are unknown. In this paper, an adapted Graph Convolutional Network (GCN) is proposed to predict 3D marker references from 3D fully-deployed markers. As original GCN is for classification, in this paper, the coarsening layers are removed and the softmax function at the network end is replaced with linear mapping for the regression task. The derived 3D and the 2D marker references are used to instantiate partially-deployed stent segment shape with the existing 3D shape instantiation framework. Validations were performed on three commonly used stent grafts and five patient-specific 3D printed aortic aneurysm phantoms. Comparable performances with average mesh distance errors of 1$\sim$3mm and average angular errors around 7degree were achieved.

LGJan 26, 2019
ACNN: a Full Resolution DCNN for Medical Image Segmentation

Xiao-Yun Zhou, Jian-Qing Zheng, Peichao Li et al.

Deep Convolutional Neural Networks (DCNNs) are used extensively in medical image segmentation and hence 3D navigation for robot-assisted Minimally Invasive Surgeries (MISs). However, current DCNNs usually use down sampling layers for increasing the receptive field and gaining abstract semantic information. These down sampling layers decrease the spatial dimension of feature maps, which can be detrimental to image segmentation. Atrous convolution is an alternative for the down sampling layer. It increases the receptive field whilst maintains the spatial dimension of feature maps. In this paper, a method for effective atrous rate setting is proposed to achieve the largest and fully-covered receptive field with a minimum number of atrous convolutional layers. Furthermore, a new and full resolution DCNN - Atrous Convolutional Neural Network (ACNN), which incorporates cascaded atrous II-blocks, residual learning and Instance Normalization (IN) is proposed. Application results of the proposed ACNN to Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) image segmentation demonstrate that the proposed ACNN can achieve higher segmentation Intersection over Unions (IoUs) than U-Net and Deeplabv3+, but with reduced trainable parameters.

CVOct 13, 2018
Varifocal-Net: A Chromosome Classification Approach using Deep Convolutional Networks

Yulei Qin, Juan Wen, Hao Zheng et al.

Chromosome classification is critical for karyotyping in abnormality diagnosis. To expedite the diagnosis, we present a novel method named Varifocal-Net for simultaneous classification of chromosome's type and polarity using deep convolutional networks. The approach consists of one global-scale network (G-Net) and one local-scale network (L-Net). It follows three stages. The first stage is to learn both global and local features. We extract global features and detect finer local regions via the G-Net. By proposing a varifocal mechanism, we zoom into local parts and extract local features via the L-Net. Residual learning and multi-task learning strategies are utilized to promote high-level feature extraction. The detection of discriminative local parts is fulfilled by a localization subnet of the G-Net, whose training process involves both supervised and weakly-supervised learning. The second stage is to build two multi-layer perceptron classifiers that exploit features of both two scales to boost classification performance. The third stage is to introduce a dispatch strategy of assigning each chromosome to a type within each patient case, by utilizing the domain knowledge of karyotyping. Evaluation results from 1909 karyotyping cases showed that the proposed Varifocal-Net achieved the highest accuracy per patient case (%) 99.2 for both type and polarity tasks. It outperformed state-of-the-art methods, demonstrating the effectiveness of our varifocal mechanism, multi-scale feature ensemble, and dispatch strategy. The proposed method has been applied to assist practical karyotype diagnosis.

CVSep 11, 2018
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation

Xiao-Yun Zhou, Guang-Zhong Yang

2D biomedical semantic segmentation is important for robotic vision in surgery. Segmentation methods based on Deep Convolutional Neural Network (DCNN) can out-perform conventional methods in terms of both accuracy and levels of automation. One common issue in training a DCNN for biomedical semantic segmentation is the internal covariate shift where the training of convolutional kernels is encumbered by the distribution change of input features, hence both the training speed and performance are decreased. Batch Normalization (BN) is the first proposed method for addressing internal covariate shift and is widely used. Instance Normalization (IN) and Layer Normalization (LN) have also been proposed. Group Normalization (GN) is proposed more recently and has not yet been applied to 2D biomedical semantic segmentation, however, no specific validations on GN were given. Most DCNNs for biomedical semantic segmentation adopt BN as the normalization method by default, without reviewing its performance. In this paper, four normalization methods - BN, IN, LN and GN are compared in details, specifically for 2D biomedical semantic segmentation. U-Net is adopted as the basic DCNN structure. Three datasets regarding the Right Ventricle (RV), aorta, and Left Ventricle (LV) are used for the validation. The results show that detailed subdivision of the feature map, i.e. GN with a large group number or IN, achieves higher accuracy. This accuracy improvement mainly comes from better model generalization. Codes are uploaded and maintained at Xiao-Yun Zhou's Github.

MED-PHAug 20, 2018
Translational Motion Compensation for Soft Tissue Velocity Images

Christina Koutsoumpa, Jennifer Keegan, David Firmin et al.

Purpose: Advancements in MRI Tissue Phase Velocity Mapping (TPM) allow for the acquisition of higher quality velocity cardiac images providing better assessment of regional myocardial deformation for accurate disease diagnosis, pre-operative planning and post-operative patient surveillance. Translation of TPM velocities from the scanner's reference coordinate system to the regional cardiac coordinate system requires decoupling of translational motion and motion due to myocardial deformation. Despite existing techniques for respiratory motion compensation in TPM, there is still a remaining translational velocity component due to the global motion of the beating heart. To compensate for translational motion in cardiac TPM, we propose an image-processing method, which we have evaluated on synthetic data and applied on in vivo TPM data. Methods: Translational motion is estimated from a suitable region of velocities automatically defined in the left-ventricular volume. The region is generated by dilating the medial axis of myocardial masks in each slice and the translational velocity is estimated by integration in this region. The method was evaluated on synthetic data and in vivo data corrupted with a translational velocity component (200% of the maximum measured velocity). Accuracy and robustness were examined and the method was applied on 10 in vivo datasets. Results: The results from synthetic and in vivo corrupted data show excellent performance with an estimation error less than 0.3% and high robustness in both cases. The effectiveness of the method is confirmed with visual observation of results from the 10 datasets. Conclusion: The proposed method is accurate and suitable for translational motion correction of the left ventricular velocity fields. The current method for translational motion compensation could be applied to any annular contracting (tissue) structure.

ROAug 13, 2018
Intraoperative robotic-assisted large-area high-speed microscopic imaging and intervention

Petros Giataganas, Michael Hughes, Christopher J. Payne et al.

Objective: Probe-based confocal endomicroscopy is an emerging high-magnification optical imaging technique that provides in vivo and in situ cellular-level imaging for real-time assessment of tissue pathology. Endomicroscopy could potentially be used for intraoperative surgical guidance, but it is challenging to assess a surgical site using individual microscopic images due to the limited field-of-view and difficulties associated with manually manipulating the probe. Methods: In this paper, a novel robotic device for large-area endomicroscopy imaging is proposed, demonstrating a rapid, but highly accurate, scanning mechanism with image-based motion control which is able to generate histology-like endomicroscopy mosaics. The device also includes, for the first time in robotic-assisted endomicroscopy, the capability to ablate tissue without the need for an additional tool. Results: The device achieves pre-programmed trajectories with positioning accuracy of less than 30 um, while the image-based approach demonstrated that it can suppress random motion disturbances up to 1.25 mm/s. Mosaics are presented from a range of ex vivo human and animal tissues, over areas of more than 3 mm^2, scanned in approximate 10 seconds. Conclusion: This work demonstrates the potential of the proposed instrument to generate large-area, high-resolution microscopic images for intraoperative tissue identification and margin assessment. Significance: This approach presents an important alternative to current histology techniques, significantly reducing the tissue assessment time, while simultaneously providing the capability to mark and ablate suspicious areas intraoperatively.

ROJun 18, 2018
Agricultural Robotics: The Future of Robotic Agriculture

Tom Duckett, Simon Pearson, Simon Blackmore et al.

Agri-Food is the largest manufacturing sector in the UK. It supports a food chain that generates over £108bn p.a., with 3.9m employees in a truly international industry and exports £20bn of UK manufactured goods. However, the global food chain is under pressure from population growth, climate change, political pressures affecting migration, population drift from rural to urban regions and the demographics of an aging global population. These challenges are recognised in the UK Industrial Strategy white paper and backed by significant investment via a Wave 2 Industrial Challenge Fund Investment ("Transforming Food Production: from Farm to Fork"). Robotics and Autonomous Systems (RAS) and associated digital technologies are now seen as enablers of this critical food chain transformation. To meet these challenges, this white paper reviews the state of the art in the application of RAS in Agri-Food production and explores research and innovation needs to ensure these technologies reach their full potential and deliver the necessary impacts in the Agri-Food sector.

CVApr 9, 2018
Abdominal Aortic Aneurysm Segmentation with a Small Number of Training Subjects

Jian-Qing Zheng, Xiao-Yun Zhou, Qing-Biao Li et al.

Pre-operative Abdominal Aortic Aneurysm (AAA) 3D shape is critical for customized stent-graft design in Fenestrated Endovascular Aortic Repair (FEVAR). Traditional segmentation approaches implement expert-designed feature extractors while recent deep neural networks extract features automatically with multiple non-linear modules. Usually, a large training dataset is essential for applying deep learning on AAA segmentation. In this paper, the AAA was segmented using U-net with a small number (two) of training subjects. Firstly, Computed Tomography Angiography (CTA) slices were augmented with gray value variation and translation to avoid the overfitting caused by the small number of training subjects. Then, U-net was trained to segment the AAA. Dice Similarity Coefficients (DSCs) over 0.8 were achieved on the testing subjects. The PLZ, DLZ and aortic branches are all reconstructed reasonably, which will facilitate stent graft customization and help shape instantiation for intra-operative surgery navigation in FEVAR.