Han Yang

h-index13

5papers

150citations

Novelty59%

AI Score42

Ranked #59,304 of 194,257 authors (top 31%)#519 in IV (top 12%)

5 Papers

27.1CVJul 24, 2023Code

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

Contrastive Language-Image Pre-training (CLIP) has become a promising language-supervised visual pre-training framework. This paper aims to distill small CLIP models supervised by a large teacher CLIP model. We propose several distillation strategies, including relation, feature, gradient and contrastive paradigms, to examine the effectiveness of CLIP-Knowledge Distillation (KD). We show that a simple feature mimicry with Mean Squared Error loss works surprisingly well. Moreover, interactive contrastive learning across teacher and student encoders is also effective in performance improvement. We explain that the success of CLIP-KD can be attributed to maximizing the feature similarity between teacher and student. The unified method is applied to distill several student models trained on CC3M+12M. CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the teacher, CLIP-KD achieves 57.5\% and 55.4\% zero-shot top-1 ImageNet accuracy over ViT-B/16 and ResNet-50, surpassing the original CLIP without KD by 20.5\% and 20.1\% margins, respectively. Our code is released on https://github.com/winycg/CLIP-KD.

10.4IVMar 15, 2023Code

Lung Nodule Segmentation and Uncertain Region Prediction with an Uncertainty-Aware Attention Mechanism

Han Yang, Qiuli Wang, Yue Zhang et al.

Radiologists possess diverse training and clinical experiences, leading to variations in the segmentation annotations of lung nodules and resulting in segmentation uncertainty.Conventional methods typically select a single annotation as the learning target or attempt to learn a latent space comprising multiple annotations. However, these approaches fail to leverage the valuable information inherent in the consensus and disagreements among the multiple annotations. In this paper, we propose an Uncertainty-Aware Attention Mechanism (UAAM) that utilizes consensus and disagreements among multiple annotations to facilitate better segmentation. To this end, we introduce the Multi-Confidence Mask (MCM), which combines a Low-Confidence (LC) Mask and a High-Confidence (HC) Mask.The LC mask indicates regions with low segmentation confidence, where radiologists may have different segmentation choices. Following UAAM, we further design an Uncertainty-Guide Multi-Confidence Segmentation Network (UGMCS-Net), which contains three modules: a Feature Extracting Module that captures a general feature of a lung nodule, an Uncertainty-Aware Module that produces three features for the the annotations' union, intersection, and annotation set, and an Intersection-Union Constraining Module that uses distances between the three features to balance the predictions of final segmentation and MCM. To comprehensively demonstrate the performance of our method, we propose a Complex Nodule Validation on LIDC-IDRI, which tests UGMCS-Net's segmentation performance on lung nodules that are difficult to segment using common methods. Experimental results demonstrate that our method can significantly improve the segmentation performance on nodules that are difficult to segment using conventional methods.

17.9LGJan 31, 2025

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

Yunyang Li, Lin Huang, Zhihao Ding et al.

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner $6j$ convolution (Wigner $6j$ Conv). By shifting the computational burden from edges to nodes, the Wigner $6j$ Conv reduces the complexity from $O(|\mathcal{E}|)$ to $ O(| \mathcal{V}|)$ while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x-30x speedup compared to conventional $\mathrm{SO}(3)$ convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable and efficient molecular modeling.

2.7CLAug 28, 2025

Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach

Han Yang, Jian Lan, Yihong Liu et al.

Autoregressive language models are vulnerable to orthographic attacks, where input text is perturbed with characters from multilingual alphabets, leading to substantial performance degradation. This vulnerability primarily stems from the out-of-vocabulary issue inherent in subword tokenizers and their embeddings. To address this limitation, we propose a pixel-based generative language model that replaces the text-based embeddings with pixel-based representations by rendering words as individual images. This design provides stronger robustness to noisy inputs, while an extension of compatibility to multilingual text across diverse writing systems. We evaluate the proposed method on the multilingual LAMBADA dataset, WMT24 dataset and the SST-2 benchmark, demonstrating both its resilience to orthographic noise and its effectiveness in multilingual settings.

10.0IVOct 24, 2021

Uncertainty-Guided Lung Nodule Segmentation with Feature-Aware Attention

Han Yang, Lu Shen, Mengke Zhang et al.

Since radiologists have different training and clinical experiences, they may provide various segmentation annotations for a lung nodule. Conventional studies choose a single annotation as the learning target by default, but they waste valuable information of consensus or disagreements ingrained in the multiple annotations. This paper proposes an Uncertainty-Guided Segmentation Network (UGS-Net), which learns the rich visual features from the regions that may cause segmentation uncertainty and contributes to a better segmentation result. With an Uncertainty-Aware Module, this network can provide a Multi-Confidence Mask (MCM), pointing out regions with different segmentation uncertainty levels. Moreover, this paper introduces a Feature-Aware Attention Module to enhance the learning of the nodule boundary and density differences. Experimental results show that our method can predict the nodule regions with different uncertainty levels and achieve superior performance in LIDC-IDRI dataset.