IVMar 7, 2022
Stepwise Feature Fusion: Local Guides GlobalJinfeng Wang, Qiming Huang, Feilong Tang et al.
Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it easy for existing deep learning models to overfitting the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new State-Of-The-Art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves statet-of-the-art performance in both learning and generalization assessment.
NAMar 20, 2016
A Two-Grid Finite Element Approximation for A Nonlinear Time-Fractional Cable EquationYang Liu, Yanwei Du, Hong Li et al.
In this article, a nonlinear fractional Cable equation is solved by a two-grid algorithm combined with finite element (FE) method. A temporal second-order fully discrete two-grid FE scheme, in which the spatial direction is approximated by two-grid FE method and the integer and fractional derivatives in time are discretized by second-order two-step backward difference method and second-order weighted and shifted Grünwald difference (WSGD) scheme, is presented to solve nonlinear fractional Cable equation. The studied algorithm in this paper mainly covers two steps: First, the numerical solution of nonlinear FE scheme on the coarse grid is solved, Second, based on the solution of initial iteration on the coarse grid, the linearized FE system on the fine grid is solved by using Newton iteration. Here, the stability based on fully discrete two-grid method is derived. Moreover, the a priori estimates with second-order convergence rate in time is proved in detail, which is higher than the L1-approximation result with $O(τ^{2-α}+τ^{2-β})$. Finally, the numerical results by using the two-grid method and FE method are calculated, respectively, and the CPU-time is compared to verify our theoretical results.
CVDec 21, 2022
DuAT: Dual-Aggregation Transformer Network for Medical Image SegmentationFeilong Tang, Qiming Huang, Jinfeng Wang et al.
Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
NAJun 5, 2019
TGMFE Algorithm Combined with Some Time Second-Order Schemes for Nonlinear Fourth-Order Reaction Diffusion SystemBaoli Yin, Yang Liu, Hong Li et al.
In this article, a two-grid mixed finite element (TGMFE) method with some second-order time discrete schemes is developed for numerically solving nonlinear fourth-order reaction diffusion equation. The two-grid MFE method is used to approximate spatial direction, and some second-order $θ$ schemes formulated at time $t_{k-θ}$ are considered to discretize the time direction. TGMFE method covers two main steps: a nonlinear MFE system based on the space coarse grid is solved by the iterative algorithm and a coarse solution is arrived at, then a linearized MFE system with fine grid is considered and a TGMFE solution is obtained. Here, the stability and a priori error estimates in $L^2$-norm for both nonlinear Galerkin MFE system and TGMFE scheme are derived. Finally, some convergence results are computed for both nonlinear Galerkin MFE system and TGMFE scheme to verify our theoretical analysis, which show that the convergence rate of the time second-order $θ$ scheme including Crank-Nicolson scheme and second-order backward difference scheme is close to $2$, and that with the comparison to the computing time of nonlinear Galerkin MFE method, the CPU-time by using TGMFE method can be saved.
CVMar 6, 2022
A Robust Framework of Chromosome Straightening with ViT-Patch GANSifan Song, Jinfeng Wang, Fengrui Cheng et al.
Chromosomes carry the genetic information of humans. They exhibit non-rigid and non-articulated nature with varying degrees of curvature. Chromosome straightening is an important step for subsequent karyotype construction, pathological diagnosis and cytogenetic map development. However, robust chromosome straightening remains challenging, due to the unavailability of training images, distorted chromosome details and shapes after straightening, as well as poor generalization capability. In this paper, we propose a novel architecture, ViT-Patch GAN, consisting of a self-learned motion transformation generator and a Vision Transformer-based patch (ViT-Patch) discriminator. The generator learns the motion representation of chromosomes for straightening. With the help of the ViT-Patch discriminator, the straightened chromosomes retain more shape and banding pattern details. The experimental results show that the proposed method achieves better performance on Fréchet Inception Distance (FID), Learned Perceptual Image Patch Similarity (LPIPS) and downstream chromosome classification accuracy, and shows excellent generalization capability on a large dataset.
CVMar 9, 2023
Distortion-Disentangled Contrastive LearningJinfeng Wang, Sifan Song, Jionglong Su et al.
Self-supervised learning is well known for its remarkable performance in representation learning and various downstream computer vision tasks. Recently, Positive-pair-Only Contrastive Learning (POCL) has achieved reliable performance without the need to construct positive-negative training sets. It reduces memory requirements by lessening the dependency on the batch size. The POCL method typically uses a single loss function to extract the distortion invariant representation (DIR) which describes the proximity of positive-pair representations affected by different distortions. This loss function implicitly enables the model to filter out or ignore the distortion variant representation (DVR) affected by different distortions. However, existing POCL methods do not explicitly enforce the disentanglement and exploitation of the actually valuable DVR. In addition, these POCL methods have been observed to be sensitive to augmentation strategies. To address these limitations, we propose a novel POCL framework named Distortion-Disentangled Contrastive Learning (DDCL) and a Distortion-Disentangled Loss (DDL). Our approach is the first to explicitly disentangle and exploit the DVR inside the model and feature stream to improve the overall representation utilization efficiency, robustness and representation ability. Experiments carried out demonstrate the superiority of our framework to Barlow Twins and Simsiam in terms of convergence, representation quality, and robustness on several benchmark datasets.
IVJun 6, 2023
Atrial Septal Defect Detection in Children Based on Ultrasound Video Using Multiple Instances LearningYiman Liu, Qiming Huang, Xiaoxiang Han et al.
Purpose: Congenital heart defect (CHD) is the most common birth defect. Thoracic echocardiography (TTE) can provide sufficient cardiac structure information, evaluate hemodynamics and cardiac function, and is an effective method for atrial septal defect (ASD) examination. This paper aims to study a deep learning method based on cardiac ultrasound video to assist in ASD diagnosis. Materials and methods: We select two standard views of the atrial septum (subAS) and low parasternal four-compartment view (LPS4C) as the two views to identify ASD. We enlist data from 300 children patients as part of a double-blind experiment for five-fold cross-validation to verify the performance of our model. In addition, data from 30 children patients (15 positives and 15 negatives) are collected for clinician testing and compared to our model test results (these 30 samples do not participate in model training). We propose an echocardiography video-based atrial septal defect diagnosis system. In our model, we present a block random selection, maximal agreement decision and frame sampling strategy for training and testing respectively, resNet18 and r3D networks are used to extract the frame features and aggregate them to build a rich video-level representation. Results: We validate our model using our private dataset by five-cross validation. For ASD detection, we achieve 89.33 AUC, 84.95 accuracy, 85.70 sensitivity, 81.51 specificity and 81.99 F1 score. Conclusion: The proposed model is multiple instances learning-based deep learning model for video atrial septal defect detection which effectively improves ASD detection accuracy when compared to the performances of previous networks and clinical doctors.
CVFeb 14, 2023
DualStreamFoveaNet: A Dual Stream Fusion Architecture with Anatomical Awareness for Robust Fovea LocalizationSifan Song, Jinfeng Wang, Zilong Wang et al.
Accurate fovea localization is essential for analyzing retinal diseases to prevent irreversible vision loss. While current deep learning-based methods outperform traditional ones, they still face challenges such as the lack of local anatomical landmarks around the fovea, the inability to robustly handle diseased retinal images, and the variations in image conditions. In this paper, we propose a novel transformer-based architecture called DualStreamFoveaNet (DSFN) for multi-cue fusion. This architecture explicitly incorporates long-range connections and global features using retina and vessel distributions for robust fovea localization. We introduce a spatial attention mechanism in the dual-stream encoder to extract and fuse self-learned anatomical information, focusing more on features distributed along blood vessels and significantly reducing computational costs by decreasing token numbers. Our extensive experiments show that the proposed architecture achieves state-of-the-art performance on two public datasets and one large-scale private dataset. Furthermore, we demonstrate that the DSFN is more robust on both normal and diseased retina images and has better generalization capacity in cross-dataset experiments.
CVMar 7, 2024
ProMISe: Promptable Medical Image Segmentation using SAMJinfeng Wang, Sifan Song, Xinkun Wang et al.
With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and the significant domain gap between natural and medical images, fine-tuning-based strategies are costly with potential risk of instability, feature damage and catastrophic forgetting. Furthermore, some methods of transferring SAM to a domain-specific MIS through fine-tuning strategies disable the model's prompting capability, severely limiting its utilization scenarios. In this paper, we propose an Auto-Prompting Module (APM), which provides SAM-based foundation model with Euclidean adaptive prompts in the target domain. Our experiments demonstrate that such adaptive prompts significantly improve SAM's non-fine-tuned performance in MIS. In addition, we propose a novel non-invasive method called Incremental Pattern Shifting (IPS) to adapt SAM to specific medical domains. Experimental results show that the IPS enables SAM to achieve state-of-the-art or competitive performance in MIS without the need for fine-tuning. By coupling these two methods, we propose ProMISe, an end-to-end non-fine-tuned framework for Promptable Medical Image Segmentation. Our experiments demonstrate that both using our methods individually or in combination achieves satisfactory performance in low-cost pattern shifting, with all of SAM's parameters frozen.
SPFeb 28, 2025
A novel Fourier Adjacency Transformer for advanced EEG emotion recognitionJinfeng Wang, Yanhao Huang, Sifan Song et al.
EEG emotion recognition faces significant hurdles due to noise interference, signal nonstationarity, and the inherent complexity of brain activity which make accurately emotion classification. In this study, we present the Fourier Adjacency Transformer, a novel framework that seamlessly integrates Fourier-based periodic analysis with graph-driven structural modeling. Our method first leverages novel Fourier-inspired modules to extract periodic features from embedded EEG signals, effectively decoupling them from aperiodic components. Subsequently, we employ an adjacency attention scheme to reinforce universal inter-channel correlation patterns, coupling these patterns with their sample-based counterparts. Empirical evaluations on SEED and DEAP datasets demonstrate that our method surpasses existing state-of-the-art techniques, achieving an improvement of approximately 6.5% in recognition accuracy. By unifying periodicity and structural insights, this framework offers a promising direction for future research in EEG emotion analysis.
LGJun 1, 2024
Contrastive Learning Via Equivariant RepresentationSifan Song, Jinfeng Wang, Qiaochu Zhao et al.
Invariant Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning (CL) can improve overall performance. In this paper, we revisit the roles of augmentation strategies and equivariance in improving CL's efficacy. We propose CLeVER (Contrastive Learning Via Equivariant Representation), a novel equivariant contrastive learning framework compatible with augmentation strategies of arbitrary complexity for various mainstream CL backbone models. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from practical natural images, thereby improving the training efficiency and robustness of baseline models in downstream tasks and achieving state-of-the-art (SOTA) performance. Moreover, we find that leveraging equivariant information extracted by CLeVER simultaneously enhances rotational invariance and sensitivity across experimental tasks, and helps stabilize the framework when handling complex augmentations, particularly for models with small-scale backbones.
LGDec 29, 2018
Autoencoder Based Residual Deep Networks for Robust Regression Prediction and Spatiotemporal EstimationLianfa Li, Ying Fang, Jun Wu et al.
To have a superior generalization, a deep learning neural network often involves a large size of training sample. With increase of hidden layers in order to increase learning ability, neural network has potential degradation in accuracy. Both could seriously limit applicability of deep learning in some domains particularly involving predictions of continuous variables with a small size of samples. Inspired by residual convolutional neural network in computer vision and recent findings of crucial shortcuts in the brains in neuroscience, we propose an autoencoder-based residual deep network for robust prediction. In a nested way, we leverage shortcut connections to implement residual mapping with a balanced structure for efficient propagation of error signals. The novel method is demonstrated by multiple datasets, imputation of high spatiotemporal resolution non-randomness missing values of aerosol optical depth, and spatiotemporal estimation of fine particulate matter <2.5 μm, achieving the cutting edge of accuracy and efficiency. Our approach is also a general-purpose regression learner to be applicable in diverse domains.