Yanlin Chen

CV
h-index15
14papers
388citations
Novelty51%
AI Score45

14 Papers

CVApr 25, 2022Code
SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

Zhishe Wang, Yanlin Chen, Wenyu Shao et al.

The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer. Meanwhile, the convolutional operation is a content-independent interaction between the image and convolution kernel, which may lose some important contexts and further limit fusion performance. Towards this end, we present a simple and strong fusion baseline for infrared and visible images, namely\textit{ Residual Swin Transformer Fusion Network}, termed as SwinFuse. Our SwinFuse includes three parts: the global feature extraction, fusion layer and feature reconstruction. In particular, we build a fully attentional feature encoding backbone to model the long-range dependency, which is a pure transformer network and has a stronger representation ability compared with the convolutional neural networks. Moreover, we design a novel feature fusion strategy based on $L_{1}$-norm for sequence matrices, and measure the corresponding activity levels from row and column vector dimensions, which can well retain competitive infrared brightness and distinct visible details. Finally, we testify our SwinFuse with nine state-of-the-art traditional and deep learning methods on three different datasets through subjective observations and objective comparisons, and the experimental results manifest that the proposed SwinFuse obtains surprising fusion performance with strong generalization ability and competitive computational efficiency. The code will be available at https://github.com/Zhishe-Wang/SwinFuse.

CVMar 29, 2022Code
Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning

Zhishe Wang, Wenyu Shao, Yanlin Chen et al.

The existing generative adversarial fusion methods generally concatenate source images and extract local features through convolution operation, without considering their global characteristics, which tends to produce an unbalanced result and is biased towards the infrared image or visible image. Toward this end, we propose a novel end-to-end mode based on generative adversarial training to achieve better fusion balance, termed as \textit{interactive compensatory attention fusion network} (ICAFusion). In particular, in the generator, we construct a multi-level encoder-decoder network with a triple path, and adopt infrared and visible paths to provide additional intensity and gradient information. Moreover, we develop interactive and compensatory attention modules to communicate their pathwise information, and model their long-range dependencies to generate attention maps, which can more focus on infrared target perception and visible detail characterization, and further increase the representation power for feature extraction and feature reconstruction. In addition, dual discriminators are designed to identify the similar distribution between fused result and source images, and the generator is optimized to produce a more balanced result. Extensive experiments illustrate that our ICAFusion obtains superior fusion performance and better generalization ability, which precedes other advanced methods in the subjective visual description and objective metric evaluation. Our codes will be public at \url{https://github.com/Zhishe-Wang/ICAFusion}

IVJul 31, 2024Code
Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

Junxuan Yu, Rusi Chen, Yongsong Zhou et al.

Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM.

IVAug 16, 2023
OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

Han Zhou, Dong Ni, Ao Chang et al.

Ultrasound (US) imaging is indispensable in clinical practice. To diagnose certain diseases, sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. However, the limited availability of specific US video cases causes teaching difficulties in identifying corresponding diseases, which potentially impacts the detection rate of such cases. The synthesis of US videos may represent a promising solution to this issue. Nevertheless, it is challenging to accurately animate the intricate motion of dynamic anatomic structures while preserving image fidelity. To address this, we present a novel online feature-decoupling framework called OnUVS for high-fidelity US video synthesis. Our highlights can be summarized by four aspects. First, we introduced anatomic information into keypoint learning through a weakly-supervised training strategy, resulting in improved preservation of anatomical integrity and motion while minimizing the labeling burden. Second, to better preserve the integrity and textural information of US images, we implemented a dual-decoder that decouples the content and textural features in the generator. Third, we adopted a multiple-feature discriminator to extract a comprehensive range of visual cues, thereby enhancing the sharpness and fine details of the generated videos. Fourth, we constrained the motion trajectories of keypoints during online learning to enhance the fluidity of generated videos. Our validation and user studies on in-house echocardiographic and pelvic floor US videos showed that OnUVS synthesizes US videos with high fidelity.

CVMar 3, 2023
Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

Xiaoqing Zhang, Zunjie Xiao, Xiao Wu et al.

Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.

IVSep 13, 2024
Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism

Zixuan Wang, Yanlin Chen, Feiyang Wang et al.

In this paper, we propose an improved Unet model for brain tumor image segmentation, which combines coordinate attention mechanism and ASPP module to improve the segmentation effect. After the data set is divided, we do the necessary preprocessing to the image and use the improved model to experiment. First, we trained and validated the traditional Unet model. By analyzing the loss curve of the training set and the validation set, we can see that the loss value continues to decline at the first epoch and becomes stable at the eighth epoch. This process shows that the model constantly optimizes its parameters to improve performance. At the same time, the change in the miou (mean Intersection over Union) index shows that the miou value exceeded 0.6 at the 15th epoch, remained above 0.6 thereafter, and reached above 0.7 at the 46th epoch. These results indicate that the basic Unet model is effective in brain tumor image segmentation. Next, we introduce an improved Unet algorithm based on coordinate attention mechanism and ASPP module for experiments. By observing the loss change curves of the training set and the verification set, it is found that the loss value reaches the lowest point at the sixth epoch and then remains relatively stable. At the same time, the miou indicator has stabilized above 0.7 since the 20th epoch and has reached a maximum of 0.76. These results show that the new mechanism introduced significantly improves the segmentation ability of the model. Finally, we apply the trained traditional Unet model and the improved Unet model based on the coordinate attention mechanism and ASPP module to the test set for brain tumor image segmentation prediction. Compared to the traditional Unet, the enhanced model offers superior segmentation and edge accuracy, providing a more reliable method for medical image analysis with the coordinate attention mechanism and ASPP module.

CVJun 30, 2025Code
MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis

Zhe Liu, Yuhao Huang, Lian Liu et al.

Color Doppler echocardiography is a crucial tool for diagnosing mitral regurgitation (MR). Recent studies have explored intelligent methods for MR diagnosis to minimize user dependence and improve accuracy. However, these approaches often fail to align with clinical workflow and may lead to suboptimal accuracy and interpretability. In this study, we introduce an automated MR diagnosis model (MReg) developed on the 4-chamber cardiac color Doppler echocardiography video (A4C-CDV). It follows comprehensive feature mining strategies to detect MR and assess its severity, considering clinical realities. Our contribution is threefold. First, we formulate the MR diagnosis as a regression task to capture the continuity and ordinal relationships between categories. Second, we design a feature selection and amplification mechanism to imitate the sonographer's diagnostic logic for accurate MR grading. Third, inspired by the Mixture-of-Experts concept, we introduce a feature summary module to extract the category-level features, enhancing the representational capacity for more accurate grading. We trained and evaluated our proposed MReg on a large in-house A4C-CDV dataset comprising 1868 cases with three graded regurgitation labels. Compared to other weakly supervised video anomaly detection and supervised classification methods, MReg demonstrated superior performance in MR diagnosis. Our code is available at: https://github.com/cskdstz/MReg.

LGDec 3, 2025
Physics-Embedded Gaussian Process for Traffic State Estimation

Yanlin Chen, Kehua Chen, Yinhai Wang

Traffic state estimation (TSE) becomes challenging when probe-vehicle penetration is low and observations are spatially sparse. Pure data-driven methods lack physical explanations and have poor generalization when observed data is sparse. In contrast, physical models have difficulty integrating uncertainties and capturing the real complexity of traffic. To bridge this gap, recent studies have explored combining them by embedding physical structure into Gaussian process. These approaches typically introduce the governing equations as soft constraints through pseudo-observations, enabling the integration of model structure within a variational framework. However, these methods rely heavily on penalty tuning and lack principled uncertainty calibration, which makes them sensitive to model mis-specification. In this work, we address these limitations by presenting a novel Physics-Embedded Gaussian Process (PEGP), designed to integrate domain knowledge with data-driven methods in traffic state estimation. Specifically, we design two multi-output kernels informed by classic traffic flow models, constructed via the explicit application of the linearized differential operator. Experiments on HighD, NGSIM show consistent improvements over non-physics baselines. PEGP-ARZ proves more reliable under sparse observation, while PEGP-LWR achieves lower errors with denser observation. Ablation study further reveals that PEGP-ARZ residuals align closely with physics and yield calibrated, interpretable uncertainty, whereas PEGP-LWR residuals are more orthogonal and produce nearly constant variance fields. This PEGP framework combines physical priors, uncertainty quantification, which can provide reliable support for TSE.

CVFeb 18, 2024
Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training

Jian Wang, Xin Yang, Xiaohong Jia et al.

Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.

QUANT-PHOct 25, 2021
Quantum Algorithms and Lower Bounds for Linear Regression with Norm Constraints

Yanlin Chen, Ronald de Wolf

Lasso and Ridge are important minimization problems in machine learning and statistics. They are versions of linear regression with squared loss where the vector $θ\in\mathbb{R}^d$ of coefficients is constrained in either $\ell_1$-norm (for Lasso) or in $\ell_2$-norm (for Ridge). We study the complexity of quantum algorithms for finding $\varepsilon$-minimizers for these minimization problems. We show that for Lasso we can get a quadratic quantum speedup in terms of $d$ by speeding up the cost-per-iteration of the Frank-Wolfe algorithm, while for Ridge the best quantum algorithms are linear in $d$, as are the best classical algorithms. As a byproduct of our quantum lower bound for Lasso, we also prove the first classical lower bound for Lasso that is tight up to polylog-factors.

LGJun 18, 2021
Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Ruiqing Ding, Yu Zhou, Jie Xu et al.

The utilization of computer technology to solve problems in medical scenarios has attracted considerable attention in recent years, which still has great potential and space for exploration. Among them, machine learning has been widely used in the prediction, diagnosis and even treatment of Sepsis. However, state-of-the-art methods require large amounts of labeled medical data for supervised learning. In real-world applications, the lack of labeled data will cause enormous obstacles if one hospital wants to deploy a new Sepsis detection system. Different from the supervised learning setting, we need to use known information (e.g., from another hospital with rich labeled data) to help build a model with acceptable performance, i.e., transfer learning. In this paper, we propose a semi-supervised optimal transport with self-paced ensemble framework for Sepsis early detection, called SPSSOT, to transfer knowledge from the other that has rich labeled data. In SPSSOT, we first extract the same clinical indicators from the source domain (e.g., hospital with rich labeled data) and the target domain (e.g., hospital with little labeled data), then we combine the semi-supervised domain adaptation based on optimal transport theory with self-paced under-sampling to avoid a negative transfer possibly caused by covariate shift and class imbalance. On the whole, SPSSOT is an end-to-end transfer learning method for Sepsis early detection which can automatically select suitable samples from two domains respectively according to the number of iterations and align feature space of two domains. Extensive experiments on two open clinical datasets demonstrate that comparing with other methods, our proposed SPSSOT, can significantly improve the AUC values with only 1% labeled data in the target domain in two transfer learning scenarios, MIMIC $rightarrow$ Challenge and Challenge $rightarrow$ MIMIC.

DSApr 14, 2021
Dimension-Preserving Reductions Between SVP and CVP in Different $p$-Norms

Divesh Aggarwal, Yanlin Chen, Rajendra Kumar et al.

$ \newcommand{\SVP}{\textsf{SVP}} \newcommand{\CVP}{\textsf{CVP}} \newcommand{\eps}{\varepsilon} $We show a number of reductions between the Shortest Vector Problem and the Closest Vector Problem over lattices in different $\ell_p$ norms ($\SVP_p$ and $\CVP_p$ respectively). Specifically, we present the following $2^{\eps m}$-time reductions for $1 \leq p \leq q \leq \infty$, which all increase the rank $n$ and dimension $m$ of the input lattice by at most one: $\bullet$ a reduction from $\widetilde{O}(1/\eps^{1/p})γ$-approximate $\SVP_q$ to $γ$-approximate $\SVP_p$; $\bullet$ a reduction from $\widetilde{O}(1/\eps^{1/p}) γ$-approximate $\CVP_p$ to $γ$-approximate $\CVP_q$; and $\bullet$ a reduction from $\widetilde{O}(1/\eps^{1+1/p})$-$\CVP_q$ to $(1+\eps)$-unique $\SVP_p$ (which in turn trivially reduces to $(1+\eps)$-approximate $\SVP_p$). The last reduction is interesting even in the case $p = q$. In particular, this special case subsumes much prior work adapting $2^{O(m)}$-time $\SVP_p$ algorithms to solve $O(1)$-approximate $\CVP_p$. In the (important) special case when $p = q$, $1 \leq p \leq 2$, and the $\SVP_p$ oracle is exact, we show a stronger reduction, from $O(1/\eps^{1/p})\text{-}\CVP_p$ to (exact) $\SVP_p$ in $2^{\eps m}$ time. For example, taking $\eps = \log m/m$ and $p = 2$ gives a slight improvement over Kannan's celebrated polynomial-time reduction from $\sqrt{m}\text{-}\CVP_2$ to $\SVP_2$. We also note that the last two reductions can be combined to give a reduction from approximate-$\CVP_p$ to $\SVP_q$ for any $p$ and $q$, regardless of whether $p \leq q$ or $p > q$. Our techniques combine those from the recent breakthrough work of Eisenbrand and Venzin (which showed how to adapt the current fastest known algorithm for these problems in the $\ell_2$ norm to all $\ell_p$ norms) together with sparsification-based techniques.

AIJun 21, 2020
To Explain or Not to Explain: A Study on the Necessity of Explanations for Autonomous Vehicles

Yuan Shen, Shanduojiao Jiang, Yanlin Chen et al.

Explainable AI, in the context of autonomous systems, like self-driving cars, has drawn broad interests from researchers. Recent studies have found that providing explanations for autonomous vehicles' actions has many benefits (e.g., increased trust and acceptance), but put little emphasis on when an explanation is needed and how the content of explanation changes with driving context. In this work, we investigate which scenarios people need explanations and how the critical degree of explanation shifts with situations and driver types. Through a user experiment, we ask participants to evaluate how necessary an explanation is and measure the impact on their trust in self-driving cars in different contexts. Moreover, we present a self-driving explanation dataset with first-person explanations and associated measures of the necessity for 1103 video clips, augmenting the Berkeley Deep Drive Attention dataset. Our research reveals that driver types and driving scenarios dictate whether an explanation is necessary. In particular, people tend to agree on the necessity for near-crash events but hold different opinions on ordinary or anomalous driving situations.

DSFeb 19, 2020
Improved Classical and Quantum Algorithms for the Shortest Vector Problem via Bounded Distance Decoding

Divesh Aggarwal, Yanlin Chen, Rajendra Kumar et al.

The most important computational problem on lattices is the Shortest Vector Problem (SVP). In this paper, we present new algorithms that improve the state-of-the-art for provable classical/quantum algorithms for SVP. We present the following results. $\bullet$ A new algorithm for SVP that provides a smooth tradeoff between time complexity and memory requirement. For any positive integer $4\leq q\leq \sqrt{n}$, our algorithm takes $q^{13n+o(n)}$ time and requires $poly(n)\cdot q^{16n/q^2}$ memory. This tradeoff which ranges from enumeration ($q=\sqrt{n}$) to sieving ($q$ constant), is a consequence of a new time-memory tradeoff for Discrete Gaussian sampling above the smoothing parameter. $\bullet$ A quantum algorithm for SVP that runs in time $2^{0.950n+o(n)}$ and requires $2^{0.5n+o(n)}$ classical memory and poly(n) qubits. In Quantum Random Access Memory (QRAM) model this algorithm takes only $2^{0.835n+o(n)}$ time and requires a QRAM of size $2^{0.293n+o(n)}$, poly(n) qubits and $2^{0.5n}$ classical space. This improves over the previously fastest classical (which is also the fastest quantum) algorithm due to [ADRS15] that has a time and space complexity $2^{n+o(n)}$. $\bullet$ A classical algorithm for SVP that runs in time $2^{1.669n+o(n)}$ time and $2^{0.5n+o(n)}$ space. This improves over an algorithm of [CCL18] that has the same space complexity. The time complexity of our classical and quantum algorithms are obtained using a known upper bound on a quantity related to the lattice kissing number which is $2^{0.402n}$. We conjecture that for most lattices this quantity is a $2^{o(n)}$. Assuming that this is the case, our classical algorithm runs in time $2^{1.292n+o(n)}$, our quantum algorithm runs in time $2^{0.750n+o(n)}$ and our quantum algorithm in QRAM model runs in time $2^{0.667n+o(n)}$.