Quan Quan

CV
h-index14
43papers
709citations
Novelty48%
AI Score54

43 Papers

IVMar 4, 2022Code
Universal Segmentation of 33 Anatomies

Pengbo Liu, Yang Deng, Ce Wang et al.

In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to learn from a union of multiple datasets, with each dataset containing the images that are partially labeled. Secondly, along the line of partial labelling, we contribute an open-source, large-scale vertebra segmentation dataset for the benefit of spine analysis community, CTSpine1K, boasting over 1,000 3D volumes and over 11K annotated vertebrae. Thirdly, in a 3D medical image segmentation task, due to the limitation of GPU memory, we always train a model using cropped patches as inputs instead a whole 3D volume, which limits the amount of contextual information to be learned. To this, we propose a cross-patch transformer module to fuse more information in adjacent patches, which enlarges the aggregated receptive field for improved segmentation performance. This is especially important for segmenting, say, the elongated spine. Based on 7 partially labeled datasets that collectively contain about 2,800 3D volumes, we successfully learn such a universal model. Finally, we evaluate the universal model on multiple open-source datasets, proving that our model has a good generalization performance and can potentially serve as a solid foundation for downstream tasks.

CVJun 13, 2023Code
UOD: Universal One-shot Detection of Anatomical Landmarks

Heqin Zhu, Quan Quan, Qingsong Yao et al.

One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain. The code is available at https://github.com/heqin-zhu/UOD_universal_oneshot_detection.

CVMar 3, 2022Code
Relative distance matters for one-shot landmark detection

Qingsong Yao, Jianji Wang, Yihua Sun et al.

Contrastive learning based methods such as cascade comparing to detect (CC2D) have shown great potential for one-shot medical landmark detection. However, the important cue of relative distance between landmarks is ignored in CC2D. In this paper, we upgrade CC2D to version II by incorporating a simple-yet-effective relative distance bias in the training stage, which is theoretically proved to encourage the encoder to project the relatively distant landmarks to the embeddings with low similarities. As consequence, CC2Dv2 is less possible to detect a wrong point far from the correct landmark. Furthermore, we present an open-source, landmark-labeled dataset for the measurement of biomechanical parameters of the lower extremity to alleviate the burden of orthopedic surgeons. The effectiveness of CC2Dv2 is evaluated on the public dataset from the ISBI 2015 Grand-Challenge of cephalometric radiographs and our new dataset, which greatly outperforms the state-of-the-art one-shot landmark detection approaches.

IVMar 10, 2022Code
Recovering medical images from CT film photos

Quan Quan, Qiyuan Wang, Yuanqi Du et al.

While medical images such as computed tomography (CT) are stored in DICOM format in hospital PACS, it is still quite routine in many countries to print a film as a transferable medium for the purposes of self-storage and secondary consultation. Also, with the ubiquitousness of mobile phone cameras, it is quite common to take pictures of CT films, which unfortunately suffer from geometric deformation and illumination variation. In this work, we study the problem of recovering a CT film, which marks \textbf{the first attempt} in the literature, to the best of our knowledge. We start with building a large-scale head CT film database CTFilm20K, consisting of approximately 20,000 pictures, using the widely used computer graphics software Blender. We also record all accompanying information related to the geometric deformation (such as 3D coordinate, depth, normal, and UV maps) and illumination variation (such as albedo map). Then we propose a deep framework called \textbf{F}ilm \textbf{I}mage \textbf{Re}covery \textbf{Net}work (\textbf{FIReNet}) to tackle geometric deformation and illumination variation using the multiple maps extracted from the CT films to collaboratively guide the recovery process. Finally, we convert the dewarped images to DICOM files with our cascade model for further analysis such as radiomics feature extraction. Extensive experiments demonstrate the superiority of our approach over the previous approaches. We plan to open source the simulated images and deep models for promoting the research on CT film image analysis.

CVNov 16, 2023Code
Slide-SAM: Medical SAM Meets Sliding Window

Quan Quan, Fenghe Tang, Zikang Xu et al.

The Segment Anything Model (SAM) has achieved a notable success in two-dimensional image segmentation in natural images. However, the substantial gap between medical and natural images hinders its direct application to medical image segmentation tasks. Particularly in 3D medical images, SAM struggles to learn contextual relationships between slices, limiting its practical applicability. Moreover, applying 2D SAM to 3D images requires prompting the entire volume, which is time- and label-consuming. To address these problems, we propose Slide-SAM, which treats a stack of three adjacent slices as a prediction window. It firstly takes three slices from a 3D volume and point- or bounding box prompts on the central slice as inputs to predict segmentation masks for all three slices. Subsequently, the masks of the top and bottom slices are then used to generate new prompts for adjacent slices. Finally, step-wise prediction can be achieved by sliding the prediction window forward or backward through the entire volume. Our model is trained on multiple public and private medical datasets and demonstrates its effectiveness through extensive 3D segmetnation experiments, with the help of minimal prompts. Code is available at \url{https://github.com/Curli-quan/Slide-SAM}.

SYNov 29, 2012
Additive-State-Decomposition-Based Tracking Control for TORA Benchmark

Quan Quan, Kai-Yuan Cai

In this paper, a new control scheme, called additive state decomposition based tracking control, is proposed to solve the tracking (rejection) problem for rotational position of the TORA (a nonlinear nonminimum phase system). By the additive state decomposition, the tracking (rejection) task for the considered nonlinear system is decomposed into two independent subtasks: a tracking (rejection) subtask for a linear time invariant (LTI) system, leaving a stabilization subtask for a derived nonlinear system. By the decomposition, the proposed tracking control scheme avoids solving regulation equations and can tackle the tracking (rejection) problem in the presence of any external signal (except for the frequencies at +1 or -1) generated by a marginally stable autonomous LTI system. To demonstrate the effectiveness, numerical simulation is given.

CVAug 11, 2024Code
HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Fenghe Tang, Ronghao Xu, Qingsong Yao et al.

The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a collection of multiple large-scale 3D medical imaging datasets. Extensive experiments indicate that our proposed pre-training strategy demonstrates robust transfer-ability in supervised downstream tasks and sheds light on HySparK's promising prospects. The code is available at https://github.com/FengheTan9/HySparK

SYMar 1, 2018
Terminal Iterative Learning Control for Autonomous Aerial Refueling under Aerodynamic Disturbances

Xunhua Dai, Quan Quan, Jinrui Ren et al.

This paper studies the model of the probe-drogue aerial refueling system under aerodynamic disturbances, and proposes a docking control method based on terminal iterative learning control to compensate for the docking errors caused by aerodynamic disturbances. The designed controller works as an additional unit for the trajectory generation function of the original autopilot system. Simulations based on our previously published simulation environment show that the proposed control method has a fast learning speed to achieve a successful docking control under aerodynamic disturbances including the bow wave effect.

SYJan 8, 2014
Additive-State-Decomposition Dynamic Inversion Stabilized Control for a Class of Uncertain MIMO Systems

Quan Quan, Guangxun Du, Kai-Yuan Cai

This paper presents a new control, namely additive-state-decomposition dynamic inversion stabilized control, that is used to stabilize a class of multi-input multi-output (MIMO) systems subject to nonparametric time-varying uncertainties with respect to both state and input. By additive state decomposition and a new definition of output, the considered uncertain system is transformed into a minimum-phase uncertainty-free system with relative degree one, in which all uncertainties are lumped into a new disturbance at the output. Subsequently, dynamic inversion control is applied to reject the lumped disturbance. Performance analysis of the resulting closed-loop dynamics shows that the stability can be ensured. Finally, to demonstrate its effectiveness, the proposed control is applied to two existing problems by numerical simulation. Furthermore, in order to show its practicability, the proposed control is also performed on a real quadrotor to stabilize its attitude when its inertia moment matrix is subject to a large uncertainty.

LGMar 15, 2023
FairAdaBN: Mitigating unfairness with adaptive batch normalization and its application to dermatological disease classification

Zikang Xu, Shang Zhao, Quan Quan et al.

Deep learning is becoming increasingly ubiquitous in medical research and applications while involving sensitive information and even critical diagnosis decisions. Researchers observe a significant performance disparity among subgroups with different demographic attributes, which is called model unfairness, and put lots of effort into carefully designing elegant architectures to address unfairness, which poses heavy training burden, brings poor generalization, and reveals the trade-off between model performance and fairness. To tackle these issues, we propose FairAdaBN by making batch normalization adaptive to sensitive attribute. This simple but effective design can be adopted to several classification backbones that are originally unaware of fairness. Additionally, we derive a novel loss function that restrains statistical parity between subgroups on mini-batches, encouraging the model to converge with considerable fairness. In order to evaluate the trade-off between model performance and fairness, we propose a new metric, named Fairness-Accuracy Trade-off Efficiency (FATE), to compute normalized fairness improvement over accuracy drop. Experiments on two dermatological datasets show that our proposed method outperforms other methods on fairness criteria and FATE.

SYJan 25, 2012
Output Feedback Tracking Control for a Class of Uncertain Systems subject to Unmodeled Dynamics and Delay at Input

Quan Quan, Hai Lin, Kai-Yuan Cai

Besides parametric uncertainties and disturbances, the unmodeled dynamics and time delay at the input are often present in practical systems, which cannot be ignored in some cases. This paper aims to solve output feedback tracking control problem for a class of nonlinear uncertain systems subject to unmodeled high-frequency gains and time delay at the input. By the additive decomposition, the uncertain system is transformed to an uncertainty-free system, where the uncertainties, disturbance and effect of unmodeled dynamics plus time delay are lumped into a new disturbance at the output. Sequently, additive decomposition is used to decompose the transformed system, which simplifies the tracking controller design. To demonstrate the effectiveness, the proposed control scheme is applied to three benchmark examples.

CVMar 16, 2023
GDDS: Pulmonary Bronchioles Segmentation with Group Deep Dense Supervision

Mingyue Zhao, Shang Zhao, Quan Quan et al.

Airway segmentation, especially bronchioles segmentation, is an important but challenging task because distal bronchus are sparsely distributed and of a fine scale. Existing neural networks usually exploit sparse topology to learn the connectivity of bronchioles and inefficient shallow features to capture such high-frequency information, leading to the breakage or missed detection of individual thin branches. To address these problems, we contribute a new bronchial segmentation method based on Group Deep Dense Supervision (GDDS) that emphasizes fine-scale bronchioles segmentation in a simple-but-effective manner. First, Deep Dense Supervision (DDS) is proposed by constructing local dense topology skillfully and implementing dense topological learning on a specific shallow feature layer. GDDS further empowers the shallow features with better perception ability to detect bronchioles, even the ones that are not easily discernible to the naked eye. Extensive experiments on the BAS benchmark dataset have shown that our method promotes the network to have a high sensitivity in capturing fine-scale branches and outperforms state-of-the-art methods by a large margin (+12.8 % in BD and +8.8 % in TD) while only introducing a small number of extra parameters.

71.8ROMay 26
L-Learning : A Lyapunov-Based Approach Leveraging Lagrangian Mechanics for Efficient and Stable Robot Tracking

Quan Quan, Hao Li

This paper presents L-Learning, a novel data-driven control framework for robotics that integrates Lyapunov stability theory with Lagrangian mechanics to enhance trajectory tracking performance. While traditional control methods often suffer from performance degradation in dynamic and uncertain environments, data-driven approaches, while more adaptable, are frequently limited by high sample complexity and a lack of rigorous stability guarantees. L-Learning mitigates these challenges by explicitly learning the system's energy function from data, thereby optimizing performance while ensuring closed-loop stability intrinsically. Characterized by superior control accuracy, theoretical stability guarantees, and high sample efficiency, L-Learning represents a promising solution for practical robotic applications.

CVJun 8, 2023
Unsupervised augmentation optimization for few-shot medical image segmentation

Quan Quan, Shang Zhao, Qingsong Yao et al.

The augmentation parameters matter to few-shot semantic segmentation since they directly affect the training outcome by feeding the networks with varying perturbated samples. However, searching optimal augmentation parameters for few-shot segmentation models without annotations is a challenge that current methods fail to address. In this paper, we first propose a framework to determine the ``optimal'' parameters without human annotations by solving a distribution-matching problem between the intra-instance and intra-class similarity distribution, with the intra-instance similarity describing the similarity between the original sample of a particular anatomy and its augmented ones and the intra-class similarity representing the similarity between the selected sample and the others in the same class. Extensive experiments demonstrate the superiority of our optimized augmentation in boosting few-shot segmentation models. We greatly improve the top competing method by 1.27\% and 1.11\% on Abd-MRI and Abd-CT datasets, respectively, and even achieve a significant improvement for SSL-ALP on the left kidney by 3.39\% on the Abd-CT dataset.

IVMar 4, 2022
MixCL: Pixel label matters to contrastive learning

Jun Li, Quan Quan, S. Kevin Zhou

Contrastive learning and self-supervised techniques have gained prevalence in computer vision for the past few years. It is essential for medical image analysis, which is often notorious for its lack of annotations. Most existing self-supervised methods applied in natural imaging tasks focus on designing proxy tasks for unlabeled data. For example, contrastive learning is often based on the fact that an image and its transformed version share the same identity. However, pixel annotations contain much valuable information for medical image segmentation, which is largely ignored in contrastive learning. In this work, we propose a novel pre-training framework called Mixed Contrastive Learning (MixCL) that leverages both image identities and pixel labels for better modeling by maintaining identity consistency, label consistency, and reconstruction consistency together. Consequently, thus pre-trained model has more robust representations that characterize medical images. Extensive experiments demonstrate the effectiveness of the proposed method, improving the baseline by 5.28% and 14.12% in Dice coefficient when 5% labeled data of Spleen and 15% of BTVC are used in fine-tuning, respectively.

CVNov 14, 2022
Information-guided pixel augmentation for pixel-wise contrastive learning

Quan Quan, Qingsong Yao, Jun Li et al.

Contrastive learning (CL) is a form of self-supervised learning and has been widely used for various tasks. Different from widely studied instance-level contrastive learning, pixel-wise contrastive learning mainly helps with pixel-wise tasks such as medical landmark detection. The counterpart to an instance in instance-level CL is a pixel, along with its neighboring context, in pixel-wise CL. Aiming to build better feature representation, there is a vast literature about designing instance augmentation strategies for instance-level CL; but there is little similar work on pixel augmentation for pixel-wise CL with a pixel granularity. In this paper, we attempt to bridge this gap. We first classify a pixel into three categories, namely low-, medium-, and high-informative, based on the information quantity the pixel contains. Inspired by the ``InfoMin" principle, we then design separate augmentation strategies for each category in terms of augmentation intensity and sampling ratio. Extensive experiments validate that our information-guided pixel augmentation strategy succeeds in encoding more discriminative representations and surpassing other competitive approaches in unsupervised local feature matching. Furthermore, our pretrained model improves the performance of both one-shot and fully supervised models. To the best of our knowledge, we are the first to propose a pixel augmentation method with a pixel granularity for enhancing unsupervised pixel-wise contrastive learning.

LGDec 31, 2025Code
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Yongwei Zhang, Yuanzhe Xing, Quanyi Liang et al.

For safety-critical applications, model-free reinforcement learning (RL) faces numerous challenges, particularly the difficulty of establishing verifiable stability guarantees while maintaining high exploration efficiency. To address these challenges, we present Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that seamlessly integrates exponential stability with maximum entropy reinforcement learning (MERL). In contrast to existing methods that rely on complex reward engineering and single-step constraints, MSACL utilizes intuitive rewards and multi-step data for actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Leveraging these certificates, we then develop a stability-aware advantage function to guide policy optimization, thereby ensuring rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilization and two high-dimensional tracking tasks. Experimental results demonstrate its consistent superiority over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits significant robustness against environmental uncertainties and remarkable generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.

IVDec 4, 2023Code
MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation

Fenghe Tang, Bingkun Nian, Jianrui Ding et al.

Due to the scarcity and specific imaging characteristics in medical images, light-weighting Vision Transformers (ViTs) for efficient medical image segmentation is a significant challenge, and current studies have not yet paid attention to this issue. This work revisits the relationship between CNNs and Transformers in lightweight universal networks for medical image segmentation, aiming to integrate the advantages of both worlds at the infrastructure design level. In order to leverage the inductive bias inherent in CNNs, we abstract a Transformer-like lightweight CNNs block (ConvUtr) as the patch embeddings of ViTs, feeding Transformer with denoised, non-redundant and highly condensed semantic information. Moreover, an adaptive Local-Global-Local (LGL) block is introduced to facilitate efficient local-to-global information flow exchange, maximizing Transformer's global context information extraction capabilities. Finally, we build an efficient medical image segmentation model (MobileUtr) based on CNN and Transformer. Extensive experiments on five public medical image datasets with three different modalities demonstrate the superiority of MobileUtr over the state-of-the-art methods, while boasting lighter weights and lower computational cost. Code is available at https://github.com/FengheTan9/MobileUtr.

16.2ROApr 24
An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube

Pengda Mao, Shuli Lv, Chen Min et al.

Robot swarms navigating through unknown obstacle environments are an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Multi-step planners can make multi-step predictions to reduce deadlocks, but they require substantial computation, resulting in a lower replanning frequency. This paper proposes a novel homotopic trajectory planning framework for a robot swarm that combines centralized homotopic trajectory planning (optimal virtual tube planning) with distributed control, enabling low-computation, high-frequency replanning, thereby uniting the strengths of multi-step and reactive planners. Based on multi-parametric programming, homotopic optimal trajectories are approximated by affine functions. The resulting approximate solutions have computational complexity $O(n_t)$, where $n_t$ is the number of trajectory parameters. This low complexity makes centralized planning of a large number of optimal trajectories practical and, when combined with distributed control, enables rapid, low-cost replanning.} The effectiveness of the proposed method is validated through several simulations and experiments.

IVAug 1, 2025Code
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation

Fenghe Tang, Bingkun Nian, Jianrui Ding et al.

In clinical practice, medical image analysis often requires efficient execution on resource-constrained mobile devices. However, existing mobile models-primarily optimized for natural images-tend to perform poorly on medical tasks due to the significant information density gap between natural and medical domains. Combining computational efficiency with medical imaging-specific architectural advantages remains a challenge when developing lightweight, universal, and high-performing networks. To address this, we propose a mobile model called Mobile U-shaped Vision Transformer (Mobile U-ViT) tailored for medical image segmentation. Specifically, we employ the newly purposed ConvUtr as a hierarchical patch embedding, featuring a parameter-efficient large-kernel CNN with inverted bottleneck fusion. This design exhibits transformer-like representation learning capacity while being lighter and faster. To enable efficient local-global information exchange, we introduce a novel Large-kernel Local-Global-Local (LGL) block that effectively balances the low information density and high-level semantic discrepancy of medical images. Finally, we incorporate a shallow and lightweight transformer bottleneck for long-range modeling and employ a cascaded decoder with downsample skip connections for dense prediction. Despite its reduced computational demands, our medical-optimized architecture achieves state-of-the-art performance across eight public 2D and 3D datasets covering diverse imaging modalities, including zero-shot testing on four unseen datasets. These results establish it as an efficient yet powerful and generalization solution for mobile medical image analysis. Code is available at https://github.com/FengheTan9/Mobile-U-ViT.

CVDec 17, 2020Code
CT Film Recovery via Disentangling Geometric Deformation and Illumination Variation: Simulated Datasets and Deep Models

Quan Quan, Qiyuan Wang, Liu Li et al.

While medical images such as computed tomography (CT) are stored in DICOM format in hospital PACS, it is still quite routine in many countries to print a film as a transferable medium for the purposes of self-storage and secondary consultation. Also, with the ubiquitousness of mobile phone cameras, it is quite common to take pictures of the CT films, which unfortunately suffer from geometric deformation and illumination variation. In this work, we study the problem of recovering a CT film, which marks the first attempt in the literature, to the best of our knowledge. We start with building a large-scale head CT film database CTFilm20K, consisting of approximately 20,000 pictures, using the widely used computer graphics software Blender. We also record all accompanying information related to the geometric deformation (such as 3D coordinate, depth, normal, and UV maps) and illumination variation (such as albedo map). Then we propose a deep framework to disentangle geometric deformation and illumination variation using the multiple maps extracted from the CT films to collaboratively guide the recovery process. Extensive experiments on simulated and real images demonstrate the superiority of our approach over the previous approaches. We plan to open source the simulated images and deep models for promoting the research on CT film recovery (https://anonymous.4open.science/r/e6b1f6e3-9b36-423f-a225-55b7d0b55523/).

CVMar 8, 2024
APPLE: Adversarial Privacy-aware Perturbations on Latent Embedding for Unfairness Mitigation

Zikang Xu, Fenghe Tang, Quan Quan et al.

Ensuring fairness in deep-learning-based segmentors is crucial for health equity. Much effort has been dedicated to mitigating unfairness in the training datasets or procedures. However, with the increasing prevalence of foundation models in medical image analysis, it is hard to train fair models from scratch while preserving utility. In this paper, we propose a novel method, Adversarial Privacy-aware Perturbations on Latent Embedding (APPLE), that can improve the fairness of deployed segmentors by introducing a small latent feature perturber without updating the weights of the original model. By adding perturbation to the latent vector, APPLE decorates the latent vector of segmentors such that no fairness-related features can be passed to the decoder of the segmentors while preserving the architecture and parameters of the segmentor. Experiments on two segmentation datasets and five segmentors (three U-Net-like and two SAM-like) illustrate the effectiveness of our proposed method compared to several unfairness mitigation methods.

CVFeb 26, 2025
Correspondence-Free Pose Estimation with Patterns: A Unified Approach for Multi-Dimensional Vision

Quan Quan, Dun Dai

6D pose estimation is a central problem in robot vision. Compared with pose estimation based on point correspondences or its robust versions, correspondence-free methods are often more flexible. However, existing correspondence-free methods often rely on feature representation alignment or end-to-end regression. For such a purpose, a new correspondence-free pose estimation method and its practical algorithms are proposed, whose key idea is the elimination of unknowns by process of addition to separate the pose estimation from correspondence. By taking the considered point sets as patterns, feature functions used to describe these patterns are introduced to establish a sufficient number of equations for optimization. The proposed method is applicable to nonlinear transformations such as perspective projection and can cover various pose estimations from 3D-to-3D points, 3D-to-2D points, and 2D-to-2D points. Experimental results on both simulation and actual data are presented to demonstrate the effectiveness of the proposed method.

CVDec 5, 2023
Inspecting Model Fairness in Ultrasound Segmentation Tasks

Zikang Xu, Fenghe Tang, Quan Quan et al.

With the rapid expansion of machine learning and deep learning (DL), researchers are increasingly employing learning-based algorithms to alleviate diagnostic challenges across diverse medical tasks and applications. While advancements in diagnostic precision are notable, some researchers have identified a concerning trend: their models exhibit biased performance across subgroups characterized by different sensitive attributes. This bias not only infringes upon the rights of patients but also has the potential to lead to life-altering consequences. In this paper, we inspect a series of DL segmentation models using two ultrasound datasets, aiming to assess the presence of model unfairness in these specific tasks. Our findings reveal that even state-of-the-art DL algorithms demonstrate unfair behavior in ultrasound segmentation tasks. These results serve as a crucial warning, underscoring the necessity for careful model evaluation before their deployment in real-world scenarios. Such assessments are imperative to ensure ethical considerations and mitigate the risk of adverse impacts on patient outcomes.

IVDec 7, 2021
Which images to label for few-shot medical landmark detection?

Quan Quan, Qingsong Yao, Jun Li et al.

The success of deep learning methods relies on the availability of well-labeled large-scale datasets. However, for medical images, annotating such abundant training data often requires experienced radiologists and consumes their limited time. Few-shot learning is developed to alleviate this burden, which achieves competitive performances with only several labeled data. However, a crucial yet previously overlooked problem in few-shot learning is about the selection of template images for annotation before learning, which affects the final performance. We herein propose a novel Sample Choosing Policy (SCP) to select "the most worthy" images for annotation, in the context of few-shot medical landmark detection. SCP consists of three parts: 1) Self-supervised training for building a pre-trained deep model to extract features from radiological images, 2) Key Point Proposal for localizing informative patches, and 3) Representative Score Estimation for searching the most representative samples or templates. The advantage of SCP is demonstrated by various experiments on three widely-used public datasets. For one-shot medical landmark detection, its use reduces the mean radial errors on Cephalometric and HandXray datasets by 14.2% (from 3.595mm to 3.083mm) and 35.5% (4.114mm to 2.653mm), respectively.

RODec 2, 2021
Distributed Control for a Robotic Swarm to Pass through a Curve Virtual Tube

Quan Quan, Yan Gao, Chenggang Bai

Robotic swarm systems are now becoming increasingly attractive for many challenging applications. The main task for any robot is to reach the destination while keeping a safe separation from other robots and obstacles. In many scenarios, robots need to move within a narrow corridor, through a window or a doorframe. In order to guide all robots to move in a cluttered environment, a curve virtual tube with no obstacle inside is carefully designed in this paper. There is no obstacle inside the tube, namely the area inside the tube can be seen as a safety zone. Then, a distributed swarm controller is proposed with three elaborate control terms: a line approaching term, a robot avoidance term and a tube keeping term. Formal analysis and proofs are made to show that the curve virtual tube passing problem can be solved in a finite time. For the convenience in practical use, a modified controller with an approximate control performance is put forward. Finally, the effectiveness of the proposed method is validated by numerical simulations and real experiments. To show the advantages of the proposed method, the comparison between our method and the control barrier function method is also presented in terms of calculation speed.

RONov 22, 2021
Practical Distributed Control for Cooperative Multicopters in Structured Free Flight Concepts

Rao Fu, Quan Quan, Mengxin Li et al.

Unmanned Aerial Vehicles (UAVs) are now becoming increasingly accessible to amateur and com-mercial users alike. Several types of airspace structures are proposed in recent research, which include several structured free flight concepts. In this paper, for simplic-ity, distributed coordinating the motions of multicopters in structured airspace concepts is focused. This is formulated as a free flight problem, which includes convergence to destination lines and inter-agent collision avoidance. The destination line of each multicopter is known a priori. Further, Lyapunov-like functions are designed elaborately, and formal analysis and proofs of the proposed distributed control are made to show that the free flight control problem can be solved. What is more, by the proposed controller, a multicopter can keep away from another as soon as possible, once it enters into the safety area of another one. Simulations and experiments are given to show the effectiveness of the proposed method.

ROOct 18, 2021
How Far Two UAVs Should Be subject to Communication Uncertainties

Quan Quan, Rao Fu, Kai-Yuan

Unmanned aerial vehicles are now becoming increasingly accessible to amateur and commercial users alike. A safety air traffic management system is needed to help ensure that every newest entrant into the sky does not collide with others. Much research has been done to design various methods to perform collision avoidance with obstacles. However, how to decide the safety radius subject to communication uncertainties is still suspended. Based on assumptions on communication uncertainties and supposed control performance, a separation principle of the safety radius design and controller design is proposed. With it, the safety radius corresponding to the safety area in the design phase (without uncertainties) and flight phase (subject to uncertainties) are studied. Furthermore, the results are extended to multiple obstacles. Simulations and experiments are carried out to show the effectiveness of the proposed methods.

CVJun 24, 2021
Where is the disease? Semi-supervised pseudo-normality synthesis from an abnormal image

Yuanqi Du, Quan Quan, Hu Han et al.

Pseudo-normality synthesis, which computationally generates a pseudo-normal image from an abnormal one (e.g., with lesions), is critical in many perspectives, from lesion detection, data augmentation to clinical surgery suggestion. However, it is challenging to generate high-quality pseudo-normal images in the absence of the lesion information. Thus, expensive lesion segmentation data have been introduced to provide lesion information for the generative models and improve the quality of the synthetic images. In this paper, we aim to alleviate the need of a large amount of lesion segmentation data when generating pseudo-normal images. We propose a Semi-supervised Medical Image generative LEarning network (SMILE) which not only utilizes limited medical images with segmentation masks, but also leverages massive medical images without segmentation masks to generate realistic pseudo-normal images. Extensive experiments show that our model outperforms the best state-of-the-art model by up to 6% for data augmentation task and 3% in generating high-quality images. Moreover, the proposed semi-supervised learning achieves comparable medical image synthesis quality with supervised learning model, using only 50 of segmentation data.

IVMay 31, 2021
CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Yang Deng, Ce Wang, Yuan Hui et al.

Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement.

CVMar 8, 2021
One-Shot Medical Landmark Detection

Qingsong Yao, Quan Quan, Li Xiao et al.

The success of deep learning methods relies on the availability of a large number of datasets with annotations; however, curating such datasets is burdensome, especially for medical images. To relieve such a burden for a landmark detection task, we explore the feasibility of using only a single annotated image and propose a novel framework named Cascade Comparing to Detect (CC2D) for one-shot landmark detection. CC2D consists of two stages: 1) Self-supervised learning (CC2D-SSL) and 2) Training with pseudo-labels (CC2D-TPL). CC2D-SSL captures the consistent anatomical information in a coarse-to-fine fashion by comparing the cascade feature representations and generates predictions on the training set. CC2D-TPL further improves the performance by training a new landmark detector with those predictions. The effectiveness of CC2D is evaluated on a widely-used public dataset of cephalometric landmark detection, which achieves a competitive detection accuracy of 81.01\% within 4.0mm, comparable to the state-of-the-art fully-supervised methods using a lot more than one training image.

ROJan 19, 2021
Practical Distributed Control for VTOL UAVs to Pass a Virtual Tube

Quan Quan, Rao Fu, Mengxin Li et al.

Unmanned Aerial Vehicles (UAVs) are now becoming increasingly accessible to amateur and commercial users alike. An air traffic management (ATM) system is needed to help ensure that this newest entrant into the skies does not collide with others. In an ATM, airspace can be composed of airways, intersections and nodes. In this paper, for simplicity, distributed coordinating the motions of Vertical TakeOff and Landing (VTOL) UAVs to pass an airway is focused. This is formulated as a virtual tube passing problem, which includes passing a virtual tube, inter-agent collision avoidance and keeping within the virtual tube. Lyapunov-like functions are designed elaborately, and formal analysis based on invariant set theorem is made to show that all UAVs can pass the virtual tube without getting trapped, avoid collision and keep within the virtual tube. What is more, by the proposed distributed control, a VTOL UAV can keep away from another VTOL UAV or return back to the virtual tube as soon as possible, once it enters into the safety area of another or has a collision with the virtual tube during it is passing the virtual tube. Simulations and experiments are carried out to show the effectiveness of the proposed method and the comparison with other methods.

ROJan 8, 2021
Practical Control for Multicopters to Avoid Non-Cooperative Moving Obstacles

Quan Quan, Rao Fu, Kai-Yuan Cai

Unmanned Aerial Vehicles (UAVs) are now becoming increasingly accessible to amateur and commercial users alike. The main task for UAVs is to keep a prescribed separation with obstacles in the air. In this paper, a collision-avoidance control method for non-cooperative moving obstacles is proposed for a multicopter with the altitude hold mode by using a Lyapunov-like barrier function. Lyapunov-like functions are designed elaborately, based on which formal analysis and proofs of the proposed control are made to show that the collision-avoidance control problem can be solved if the moving obstacle is slower than the multicopter. The result can be extended to some cases of multiple obstacles. What is more, by the proposed control, a multicopter can keep away from obstacles as soon as possible, once obstacles enter into the safety area of the multicopter accidentally, and converge to the waypoint. Simulations and experiments are given to show the effectiveness of the proposed method by showing the distance between UAV and waypoint, obstacles respectively.

ROOct 19, 2020
Sky Highway Design for Dense Traffic

Quan Quan, Mengxin Li

The number of Unmanned Aerial Vehicles (UAVs) continues to explode. Within the total spectrum of Unmanned Aircraft System (UAS) operations, Urban Air Mobility (UAM) is also on the way. Dense air traffic is getting ever closer to us. Current research either focuses on traffic network design and route design for safety purpose or swarm control in open airspace to contain large volume of UAVs. In order to achieve a tradeoff between safety and volumes of UAVs, a sky highway with its basic operation for Vertical Take-Off and Landing (VTOL) UAV is proposed, where traffic network, route and swarm control design are all considered. In the sky highway, each UAV will have its route, and an airway like a highway road can allow many UAVs to perform free flight. The geometrical structure of the proposed sky highway with corresponding flight modes to support dense traffic is studied one by one. The effectiveness of the proposed sky highway is shown by the given demonstration.

ROMar 9, 2020
Fast Collision Probability Estimation Based on Finite-Dimensional Monte Carlo Method

Zhang Hepeng, Quan Quan

The safety concern for unmanned systems, namely the concern for the potential casualty caused by system abnormalities, has been a bottleneck for their development, especially in populated areas. Evidently, the collision between the unmanned system and the obstacles, including both moving and static objects, accounts for a great proportion of the system abnormalities. The route planning and corresponding controller are established in order to avoid the collision, whereas, in the presence of uncertainties, it is possible that the unmanned system would deviate from the predetermined route and collide with the obstacles. Therefore, for the safety of unmanned systems, collision probability estimation and further safety decision are very important. To estimate the collision probability, the Monte Carlo method could be applied, however, it is generally rather slow. This paper introduces a fast collision probability estimation method based on finite-dimensional distribution, whose main idea is to filter out the sampling points needed and generate the states directly by samples of finite-dimensional distribution, reducing the estimation time significantly. Besides, further techniques including the probabilistic equidistance sampling and dimension reduction, also serve to reduce the estimation time. The simulation shows that the proposed method reduces over 99% of the estimation time.

SYAug 7, 2019
Unified Simulation and Test Platform for Control Systems of Unmanned Vehicles

Xunhua Dai, Chenxu Ke, Quan Quan et al.

Control systems on unmanned vehicles are safety-critical systems whose requirements on reliability and safety are ever-increasing. Currently, testing a complex autonomous control system is an expensive and time-consuming process, which requires massive repeated experimental testing during the whole development stage. This paper presents a unified simulation and test platform for vehicle autonomous control systems aiming to significantly improve the development speed and safety level of unmanned vehicles. First, a unified modular modeling framework compatible with different types of vehicles is proposed with methods to ensure modeling credibility. Then, the simulation software system is developed by the model-based design framework, whose modular programming methods and automatic code generation functions ensure the efficiency, credibility, and standardization of the system development process. Finally, an FPGA-based real-time hardware-in-the-loop simulation platform is proposed to ensure the comprehensiveness and credibility of the simulation and test results. In the end, the proposed platform is applied to a multicopter control system. By comparing with experimental results, the accuracy and credibility of the simulation testing results are verified by using the simulation credibility assessment method proposed in our previous work. To verify the practicability of the proposed platform, several successful applications are presented for the multicopter rapid prototyping, estimation algorithm verification, autonomous flight testing, and automatic safety testing with automatic fault injection and result evaluation of unmanned vehicles.

CVJul 22, 2019
An Efficient Target Detection and Recognition Method in Aerial Remote-sensing Images Based on Multiangle Regions-of-Interest

Guangcun Shan, Hongyu Wang, Wei Liang et al.

Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition.

SYSep 1, 2018
An Analytical Design Optimization Method for Electric Propulsion Systems of Multicopter UAVs with Desired Hovering Endurance

Xunhua Dai, Quan Quan, Jinrui Ren et al.

Multicopters are becoming increasingly important in both civil and military fields. Currently, most multicopter propulsion systems are designed by experience and trial-and-error experiments, which are costly and ineffective. This paper proposes a simple and practical method to help designers find the optimal propulsion system according to the given design requirements. First, the modeling methods for four basic components of the propulsion system including propellers, motors, electric speed controls, and batteries are studied respectively. Secondly, the whole optimization design problem is simplified and decoupled into several sub-problems. By solving these sub-problems, the optimal parameters of each component can be obtained respectively. Finally, based on the obtained optimal component parameters, the optimal product of each component can be quickly located and determined from the corresponding database. Experiments and statistical analyses demonstrate the effectiveness of the proposed method.

ROMay 24, 2017
A Control Performance Index for Multicopters Under Off-nominal Conditions

Guang-Xun Du, Quan Quan, Zhiyu Xi et al.

In order to prevent loss of control (LOC) accidents,the real-time control performance monitoring problem is studied for multicopters. Different from the existing work, this paper does not try to monitor the performance of the controllers directly. In turn, the disturbances of multicopters under off-nominal conditions are estimated to affect a proposed index to tell the user whether the multicopter will be LOC or not. Firstly, a new degree of controllability (DoC) will be proposed for multicopters subject to control constrains and off-nominal conditions. Then a control performance index (CPI) is defined based on the new DoC to reflect the control performance for multicopters. Besides, the proposed CPI is applied to a new switching control framework to guide the control decision of multicopter under off-nominal conditions. Finally, simulation and experimental results show the effectiveness of the CPI and the proposed switching control framework.

CVJul 4, 2014
Calibration of Multiple Fish-Eye Cameras Using a Wand

Qiang Fu, Quan Quan, Kai-Yuan Cai

Fish-eye cameras are becoming increasingly popular in computer vision, but their use for 3D measurement is limited partly due to the lack of an accurate, efficient and user-friendly calibration procedure. For such a purpose, we propose a method to calibrate the intrinsic and extrinsic parameters (including radial distortion parameters) of two/multiple fish-eye cameras simultaneously by using a wand under general motions. Thanks to the generic camera model used, the proposed calibration method is also suitable for two/multiple conventional cameras and mixed cameras (e.g. two conventional cameras and a fish-eye camera). Simulation and real experiments demonstrate the effectiveness of the proposed method. Moreover, we develop the camera calibration toolbox, which is available online.

SYMar 24, 2014
Controllability Analysis for Multirotor Helicopter Rotor Degradation and Failure

Guang-Xun Du, Quan Quan, Binxian Yang et al.

This paper considers the controllability analysis problem for a class of multirotor systems subject to rotor failure/wear. It is shown that classical controllability theories of linear systems are not sufficient to test the controllability of the considered multirotors. Owing to this, an easy-to-use measurement index is introduced to assess the available control authority. Based on it, a new necessary and sufficient condition for the controllability of multirotors is derived. Furthermore, a controllability test procedure is approached. The proposed controllability test method is applied to a class of hexacopters with different rotor configurations and different rotor efficiency parameters to show its effectiveness. The analysis results show that hexacopters with different rotor configurations have different fault-tolerant capabilities. It is therefore necessary to test the controllability of the multirotors before any fault-tolerant control strategies are employed.

SYJul 1, 2013
Controllability Analysis and Degraded Control for a Class of Hexacopters Subject to Rotor Failures

Guang-Xun Du, Quan Quan, Kai-Yuan Cai

This paper considers the controllability analysis and fault tolerant control problem for a class of hexacopters. It is shown that the considered hexacopter is uncontrollable when one rotor fails, even though the hexacopter is over-actuated and its controllability matrix is row full rank. According to this, a fault tolerant control strategy is proposed to control a degraded system, where the yaw states of the considered hexacopter are ignored. Theoretical analysis indicates that the degraded system is controllable if and only if the maximum lift of each rotor is greater than a certain value. The simulation and experiment results on a prototype hexacopter show the feasibility of our controllability analysis and degraded control strategy.

NESep 24, 2012
A New Continuous-Time Equality-Constrained Optimization Method to Avoid Singularity

Quan Quan, Kai-Yuan Cai

In equality-constrained optimization, a standard regularity assumption is often associated with feasible point methods, namely the gradients of constraints are linearly independent. In practice, the regularity assumption may be violated. To avoid such a singularity, we propose a new projection matrix, based on which a feasible point method for the continuous-time, equality-constrained optimization problem is developed. First, the equality constraint is transformed into a continuous-time dynamical system with solutions that always satisfy the equality constraint. Then, the singularity is explained in detail and a new projection matrix is proposed to avoid singularity. An update (or say a controller) is subsequently designed to decrease the objective function along the solutions of the transformed system. The invariance principle is applied to analyze the behavior of the solution. We also propose a modified approach for addressing cases in which solutions do not satisfy the equality constraint. Finally, the proposed optimization approaches are applied to two examples to demonstrate its effectiveness.