Xin Li

h-index21

4papers

2citations

Novelty41%

AI Score27

Ranked #155,644 of 194,257 authors (top 80%)#50,546 in CV (top 85%)

4 Papers

3.6CVMar 4, 2025Code

Q&C: When Quantization Meets Cache in Efficient Image Generation

Xin Ding, Xin Li, Haotong Qin et al.

Quantization and cache mechanisms are typically applied individually for efficient Diffusion Transformers (DiTs), each demonstrating notable potential for acceleration. However, the promoting effect of combining the two mechanisms on efficient generation remains under-explored. Through empirical investigation, we find that the combination of quantization and cache mechanisms for DiT is not straightforward, and two key challenges lead to severe catastrophic performance degradation: (i) the sample efficacy of calibration datasets in post-training quantization (PTQ) is significantly eliminated by cache operation; (ii) the combination of the above mechanisms introduces more severe exposure bias within sampling distribution, resulting in amplified error accumulation in the image generation process. In this work, we take advantage of these two acceleration mechanisms and propose a hybrid acceleration method by tackling the above challenges, aiming to further improve the efficiency of DiTs while maintaining excellent generation capability. Concretely, a temporal-aware parallel clustering (TAP) is designed to dynamically improve the sample selection efficacy for the calibration within PTQ for different diffusion steps. A variance compensation (VC) strategy is derived to correct the sampling distribution. It mitigates exposure bias through an adaptive correction factor generation. Extensive experiments have shown that our method has accelerated DiTs by 12.7x while preserving competitive generation capability. The code will be available at https://github.com/xinding-sys/Quant-Cache.

6.2CVApr 25, 2025

A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images

Xin Li, Wenhui Zhu, Peijie Qiu et al.

In the field of medical imaging, the advent of deep learning, especially the application of convolutional neural networks (CNNs) has revolutionized the analysis and interpretation of medical images. Nevertheless, deep learning methods usually rely on large amounts of labeled data. In medical imaging research, the acquisition of high-quality labels is both expensive and difficult. The introduction of Vision Transformers (ViT) and self-supervised learning provides a pre-training strategy that utilizes abundant unlabeled data, effectively alleviating the label acquisition challenge while broadening the breadth of data utilization. However, ViT's high computational density and substantial demand for computing power, coupled with the lack of localization characteristics of its operations on image patches, limit its efficiency and applicability in many application scenarios. In this study, we employ nn-MobileNet, a lightweight CNN framework, to implement a BERT-style self-supervised learning approach. We pre-train the network on the unlabeled retinal fundus images from the UK Biobank to improve downstream application performance. We validate the results of the pre-trained model on Alzheimer's disease (AD), Parkinson's disease (PD), and various retinal diseases identification. The results show that our approach can significantly improve performance in the downstream tasks. In summary, this study combines the benefits of CNNs with the capabilities of advanced self-supervised learning in handling large-scale unlabeled data, demonstrating the potential of CNNs in the presence of label scarcity.

3.1MLDec 17, 2024

Adversarially robust generalization theory via Jacobian regularization for deep neural networks

Dongya Wu, Xin Li

Powerful deep neural networks are vulnerable to adversarial attacks. To obtain adversarially robust models, researchers have separately developed adversarial training and Jacobian regularization techniques. There are abundant theoretical and empirical studies for adversarial training, but theoretical foundations for Jacobian regularization are still lacking. In this study, we show that Jacobian regularization is closely related to adversarial training in that $\ell_{2}$ or $\ell_{1}$ Jacobian regularized loss serves as an approximate upper bound on the adversarially robust loss under $\ell_{2}$ or $\ell_{\infty}$ adversarial attack respectively. Further, we establish the robust generalization gap for Jacobian regularized risk minimizer via bounding the Rademacher complexity of both the standard loss function class and Jacobian regularization function class. Our theoretical results indicate that the norms of Jacobian are related to both standard and robust generalization. We also perform experiments on MNIST data classification to demonstrate that Jacobian regularized risk minimization indeed serves as a surrogate for adversarially robust risk minimization, and that reducing the norms of Jacobian can improve both standard and robust generalization. This study promotes both theoretical and empirical understandings to adversarially robust generalization via Jacobian regularization.

1.2NCMay 17, 2024

Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference

Xin Li

This paper proposes a unified framework in which consciousness emerges as a cycle-consistent, affectively anchored inference process, recursively structured by the interaction of emotion and cognition. Drawing from information theory, optimal transport, and the Bayesian brain hypothesis, we formalize emotion as a low-dimensional structural prior and cognition as a specificity-instantiating update. This emotion-cognition cycle minimizes joint uncertainty by aligning emotionally weighted priors with context-sensitive cognitive appraisals. Subjective experience thus arises as the informational footprint of temporally extended, affect-modulated simulation. We introduce the Exchangeable Integration Theory of Consciousness (EITC), modeling conscious episodes as conditionally exchangeable samples drawn from a latent affective self-model. This latent variable supports integration, via a unified cause-effect structure with nonzero irreducibility, and differentiation, by preserving contextual specificity across episodes. We connect this architecture to the Bayesian theory of consciousness through Rao-Blackwellized inference, which stabilizes inference by marginalizing latent self-structure while enabling adaptive updates. This mechanism ensures coherence, prevents inference collapse, and supports goal-directed simulation. The formal framework builds on De Finetti's exchangeability theorem, integrated information theory, and KL-regularized optimal transport. Overall, consciousness is reframed as a recursive inference process, shaped by emotion, refined by cognition, stabilized through exchangeability, and unified through a latent self-model that integrates experience across time.