Yangxin Xu

h-index1

10papers

232citations

Novelty48%

AI Score47

Ranked #53,862 of 205,806 authors (top 26%)#1,550 in RO (top 20%)

10 Papers

59.8CVApr 7

Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher

Pengcheng Weng, Yanyu Qian, Yangxin Xu et al.

Robust multimodal human sensing must overcome the critical challenge of missing modalities. Two principal barriers are the Representation Gap between heterogeneous data and the Contamination Effect from low-quality modalities. These barriers are causally linked, as the corruption introduced by contamination fundamentally impedes the reduction of representation disparities. In this paper, we propose PTA, a novel "Purify-then-Align" framework that solves this causal dependency through a synergistic integration of meta-learning and knowledge diffusion. To purify the knowledge source, PTA first employs a meta-learning-driven weighting mechanism that dynamically learns to down-weight the influence of noisy, low-contributing modalities. Subsequently, to align different modalities, PTA introduces a diffusion-based knowledge distillation paradigm in which an information-rich clean teacher, formed from this purified consensus, refines the features of each student modality. The ultimate payoff of this "Purify-then-Align" strategy is the creation of exceptionally powerful single-modality encoders imbued with cross-modal knowledge. Comprehensive experiments on the large-scale MM-Fi and XRF55 datasets, under pronounced Representation Gap and Contamination Effect, demonstrate that PTA achieves state-of-the-art performance and significantly improves the robustness of single-modality models in diverse missing-modality scenarios.

CVOct 9, 2021Code

Automatic Recognition of Abdominal Organs in Ultrasound Images based on Deep Neural Networks and K-Nearest-Neighbor Classification

Keyu Li, Yangxin Xu, Max Q. -H. Meng

Abdominal ultrasound imaging has been widely used to assist in the diagnosis and treatment of various abdominal organs. In order to shorten the examination time and reduce the cognitive burden on the sonographers, we present a classification method that combines the deep learning techniques and k-Nearest-Neighbor (k-NN) classification to automatically recognize various abdominal organs in the ultrasound images in real time. Fine-tuned deep neural networks are used in combination with PCA dimension reduction to extract high-level features from raw ultrasound images, and a k-NN classifier is employed to predict the abdominal organ in the image. We demonstrate the effectiveness of our method in the task of ultrasound image classification to automatically recognize six abdominal organs. A comprehensive comparison of different configurations is conducted to study the influence of different feature extractors and classifiers on the classification accuracy. Both quantitative and qualitative results show that with minimal training effort, our method can "lazily" recognize the abdominal organs in the ultrasound images in real time with an accuracy of 96.67%. Our implementation code is publicly available at: https://github.com/LeeKeyu/abdominal_ultrasound_classification.

CVJan 29

When Gradient Optimization Is Not Enough: $\dagger$ Dispersive and Anchoring Geometric Regularizer for Multimodal Learning

Zixuan Xia, Hao Wang, Pengcheng Weng et al.

Multimodal learning aims to integrate complementary information from heterogeneous modalities, yet strong optimization alone does not guaranty well-structured representations. Even under carefully balanced training schemes, multimodal models often exhibit geometric pathologies, including intra-modal representation collapse and sample-level cross-modal inconsistency, which degrade both unimodal robustness and multimodal fusion. We identify representation geometry as a missing control axis in multimodal learning and propose \regName, a lightweight geometry-aware regularization framework. \regName enforces two complementary constraints on intermediate embeddings: an intra-modal dispersive regularization that promotes representation diversity, and an inter-modal anchoring regularization that bounds sample-level cross-modal drift without rigid alignment. The proposed regularizer is plug-and-play, requires no architectural modifications, and is compatible with various training paradigms. Extensive experiments across multiple multimodal benchmarks demonstrate consistent improvements in both multimodal and unimodal performance, showing that explicitly regulating representation geometry effectively mitigates modality trade-offs.

26.2CVApr 2

COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

Hao Wang, Yanyu Qian, Pengcheng Weng et al.

Missing modalities remain a major challenge for multimodal sensing, because most existing methods adapt the fusion process to the observed subset by dropping absent branches, using subset-specific fusion, or reconstructing missing features. As a result, the fusion head often receives an input structure different from the one seen during training, leading to incomplete fusion and degraded cross-modal interaction. We propose COMPASS, a missing-modality fusion framework built on the principle of fusion completeness: the fusion head always receives a fixed N-slot multimodal input, with one token per modality slot. For each missing modality, COMPASS synthesizes a target-specific proxy token from the observed modalities using pairwise source-to-target generators in a shared latent space, and aggregates them into a single replacement token. To make these proxies both representation-compatible and task-informative, we combine proxy alignment, shared-space regularization, and per-proxy discriminative supervision. Experiments on XRF55, MM-Fi, and OctoNet under diverse single- and multiple-missing settings show that COMPASS outperforms prior methods on the large majority of scenarios. Our results suggest that preserving a modality-complete fusion interface is a simple and effective design principle for robust multimodal sensing.

RONov 3, 2021

Image-Guided Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography Using a Shadow-aware Dual-Agent Framework

Keyu Li, Yangxin Xu, Jian Wang et al.

Ultrasound (US) imaging is commonly used to assist in the diagnosis and interventions of spine diseases, while the standardized US acquisitions performed by manually operating the probe require substantial experience and training of sonographers. In this work, we propose a novel dual-agent framework that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography. Moreover, inspired by the nature of US propagation and the characteristics of the spinal anatomy, we introduce a view-specific acoustic shadow reward to utilize the shadow information to implicitly guide the navigation of the probe toward different standard views of the spine. Our method is validated in both quantitative and qualitative experiments in a simulation environment built with US data acquired from 17 volunteers. The average navigation accuracy toward different standard views achieves 5.18mm/5.25deg and 12.87mm/17.49deg in the intra- and inter-subject settings, respectively. The results demonstrate that our method can effectively interpret the US images and navigate the probe to acquire multiple standard views of the spine.

RONov 3, 2021

Autonomous Magnetic Navigation Framework for Active Wireless Capsule Endoscopy Inspired by Conventional Colonoscopy Procedures

Yangxin Xu, Keyu Li, Ziqi Zhao et al.

In recent years, simultaneous magnetic actuation and localization (SMAL) for active wireless capsule endoscopy (WCE) has been intensively studied to improve the efficiency and accuracy of the examination. In this paper, we propose an autonomous magnetic navigation framework for active WCE that mimics the "insertion" and "withdrawal" procedures performed by an expert physician in conventional colonoscopy, thereby enabling efficient and accurate navigation of a robotic capsule endoscope in the intestine with minimal user effort. First, the capsule is automatically propelled through the unknown intestinal environment and generate a viable path to represent the environment. Then, the capsule is autonomously navigated towards any point selected on the intestinal trajectory to allow accurate and repeated inspections of suspicious lesions. Moreover, we implement the navigation framework on a robotic system incorporated with advanced SMAL algorithms, and validate it in the navigation in various tubular environments using phantoms and an ex-vivo pig colon. Our results demonstrate that the proposed autonomous navigation framework can effectively navigate the capsule in unknown, complex tubular environments with a satisfactory accuracy, repeatability and efficiency compared with manual operation.

ROAug 26, 2021

Trajectory Following Strategies for Wireless Capsule Endoscopy under Reciprocally Rotating Magnetic Actuation in a Tubular Environment

Yangxin Xu, Keyu Li, Ziqi Zhao et al.

Currently used wireless capsule endoscopy (WCE) is limited in terms of inspection time and flexibility since the capsule is passively moved by peristalsis and cannot be accurately positioned. Different methods have been proposed to facilitate active locomotion of WCE based on simultaneous magnetic actuation and localization technologies. In this work, we investigate the trajectory following problem of a robotic capsule under rotating magnetic actuation in a tubular environment, in order to realize safe, efficient and accurate inspection of the intestine at given points using wireless capsule endoscopes. Specifically, four trajectory following strategies are developed based on the PD controller, adaptive controller, model predictive controller and robust multi-stage model predictive controller. Moreover, our method takes into account the uncertainty in the intestinal environment by modeling the intestinal peristalsis and friction during the controller design. We validate our methods in simulation as well as in real-world experiments in various tubular environments, including plastic phantoms with different shapes and an ex-vivo pig colon. The results show that our approach can effectively actuate a reciprocally rotating capsule to follow a desired trajectory in complex tubular environments, thereby having the potential to enable accurate and repeatable inspection of the intestine for high-quality diagnosis.

ROAug 25, 2021

Adaptive Simultaneous Magnetic Actuation and Localization for WCE in a Tubular Environment

Yangxin Xu, Keyu Li, Ziqi Zhao et al.

Simultaneous Magnetic Actuation and Localization (SMAL) is a promising technology for active wireless capsule endoscopy (WCE). In this paper, an adaptive SMAL system is presented to efficiently propel and precisely locate a capsule in a tubular environment with complex shapes. In order to track the capsule with high localization accuracy and update frequency in a large workspace, we propose a mechanism that can automatically activate a sub-array of sensors with the optimal layout during the capsule movement. The improved multiple objects tracking (IMOT) method is simplified and adapted to our system to estimate the 6-D pose of the capsule in real time. Also, we study the locomotion of a magnetically actuated capsule in a tubular environment, and formulate a method to adaptively adjust the pose of the actuator to improve the propulsion efficiency. Our presented methods are applicable to other permanent magnet-based SMAL systems, and help to improve the actuation efficiency of active WCE. We verify the effectiveness of our proposed system in extensive experiments on phantoms and ex-vivo animal organs. The results demonstrate that our system can achieve convincing performance compared with the state-of-the-art ones in terms of actuation efficiency, workspace size, robustness, localization accuracy and update frequency.

ROAug 25, 2021

On Reciprocally Rotating Magnetic Actuation of a Robotic Capsule in Unknown Tubular Environments

Yangxin Xu, Keyu Li, Ziqi Zhao et al.

Active wireless capsule endoscopy (WCE) based on simultaneous magnetic actuation and localization (SMAL) techniques holds great promise for improving diagnostic accuracy, reducing examination time and relieving operator burden. To date, the rotating magnetic actuation methods have been constrained to use a continuously rotating permanent magnet. In this paper, we first propose the reciprocally rotating magnetic actuation (RRMA) approach for active WCE to enhance patient safety. We first show how to generate a desired reciprocally rotating magnetic field for capsule actuation, and provide a theoretical analysis of the potential risk of causing volvulus due to the capsule motion. Then, an RRMA-based SMAL workflow is presented to automatically propel a capsule in an unknown tubular environment. We validate the effectiveness of our method in real-world experiments to automatically propel a robotic capsule in an ex-vivo pig colon. The experiment results show that our approach can achieve efficient and robust propulsion of the capsule with an average moving speed of $2.48 mm/s$ in the pig colon, and demonstrate the potential of using RRMA to enhance patient safety, reduce the inspection time, and improve the clinical acceptance of this technology.

ROMar 1, 2021

Autonomous Navigation of an Ultrasound Probe Towards Standard Scan Planes with Deep Reinforcement Learning

Keyu Li, Jian Wang, Yangxin Xu et al.

Autonomous ultrasound (US) acquisition is an important yet challenging task, as it involves interpretation of the highly complex and variable images and their spatial relationships. In this work, we propose a deep reinforcement learning framework to autonomously control the 6-D pose of a virtual US probe based on real-time image feedback to navigate towards the standard scan planes under the restrictions in real-world US scans. Furthermore, we propose a confidence-based approach to encode the optimization of image quality in the learning process. We validate our method in a simulation environment built with real-world data collected in the US imaging of the spine. Experimental results demonstrate that our method can perform reproducible US probe navigation towards the standard scan plane with an accuracy of $4.91mm/4.65^\circ$ in the intra-patient setting, and accomplish the task in the intra- and inter-patient settings with a success rate of $92\%$ and $46\%$, respectively. The results also show that the introduction of image quality optimization in our method can effectively improve the navigation performance.