Xin Xing

CV
h-index11
16papers
318citations
Novelty53%
AI Score44

16 Papers

CVJun 29, 2022Code
Causality for Inherently Explainable Transformers: CAT-XPLAIN

Subash Khanal, Benjamin Brodie, Xin Xing et al.

There have been several post-hoc explanation approaches developed to explain pre-trained black-box neural networks. However, there is still a gap in research efforts toward designing neural networks that are inherently explainable. In this paper, we utilize a recently proposed instance-wise post-hoc causal explanation method to make an existing transformer architecture inherently explainable. Once trained, our model provides an explanation in the form of top-$k$ regions in the input space of the given instance contributing to its decision. We evaluate our method on binary classification tasks using three image datasets: MNIST, FMNIST, and CIFAR. Our results demonstrate that compared to the causality-based post-hoc explainer model, our inherently explainable model achieves better explainability results while eliminating the need of training a separate explainer model. Our code is available at https://github.com/mvrl/CAT-XPLAIN.

CVOct 24, 2023Code
Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Xin Xing, Zhexiao Xiong, Abby Stylianou et al.

This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL.

NANov 9, 2018
An efficient method for block low-rank approximations for kernel matrix systems

Xin Xing, Edmond Chow

In the iterative solution of dense linear systems from boundary integral equations or systems involving kernel matrices, the main challenges are the expensive matrix-vector multiplication and the storage cost which are usually tackled by hierarchical matrix techniques such as $\mathcal{H}$ and $\mathcal{H}^2$ matrices. However, hierarchical matrices also have a high construction cost that is dominated by the low-rank approximations of the sub-blocks of the kernel matrix. In this paper, an efficient method is proposed to give a low-rank approximation of the kernel matrix block $K(X_0, Y_0)$ in the form of an interpolative decomposition (ID) for a kernel function $K(x,y)$ and two properly located point sets $X_0, Y_0$. The proposed method combines the ID using strong rank-revealing QR (sRRQR), which is purely algebraic, with analytic kernel information to reduce the construction cost of a rank-$r$ approximation from $O(r|X_0||Y_0|)$, for ID using sRRQR alone, to $O(r|X_0|)$ which is not related to $|Y_0|$. Numerical experiments show that $\mathcal{H}^2$ matrix construction with the proposed algorithm only requires a computational cost linear in the matrix dimension.

NAOct 31, 2018
Error analysis of an accelerated interpolative decomposition for 3D Laplace problems

Xin Xing, Edmond Chow

In constructing the $\mathcal{H}^2$ representation of dense matrices defined by the Laplace kernel, the interpolative decomposition of certain off-diagonal submatrices that dominates the computation can be dramatically accelerated using the concept of a proxy surface. We refer to the computation of such interpolative decompositions as the proxy surface method. We present an error bound for the proxy surface method in the 3D case and thus provide theoretical guidance for the discretization of the proxy surface in the method.

CVJul 12, 2024
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion

Zhexiao Xiong, Xin Xing, Scott Workman et al.

We introduce the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area. This contrasts with previous work which only uses input panoramas (same-view synthesis), or an input satellite image (cross-view synthesis). We argue that the mixed-view setting is the most natural to support panorama synthesis for arbitrary locations worldwide. A critical challenge is that the spatial coverage of panoramas is uneven, with few panoramas available in many regions of the world. We introduce an approach that utilizes diffusion-based modeling and an attention-based architecture for extracting information from all available input imagery. Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize. The project page is available at https://mixed-view.github.io

TOOct 10, 2025
MAE-SAM2: Mask Autoencoder-Enhanced SAM2 for Clinical Retinal Vascular Leakage Segmentation

Xin Xing, Irmak Karaca, Amir Akhavanrezayat et al.

We propose MAE-SAM2, a novel foundation model for retinal vascular leakage segmentation on fluorescein angiography images. Due to the small size and dense distribution of the leakage areas, along with the limited availability of labeled clinical data, this presents a significant challenge for segmentation tasks. Our approach integrates a Self-Supervised learning (SSL) strategy, Masked Autoencoder (MAE), with SAM2. In our implementation, we explore different loss functions and conclude a task-specific combined loss. Extensive experiments and ablation studies demonstrate that MAE-SAM2 outperforms several state-of-the-art models, achieving the highest Dice score and Intersection-over-Union (IoU). Compared to the original SAM2, our model achieves a $5\%$ performance improvement, highlighting the promise of foundation models with self-supervised pretraining in clinical imaging tasks.

CVNov 30, 2020Code
Dynamic Image for 3D MRI Image Alzheimer's Disease Classification

Xin Xing, Gongbo Liang, Hunter Blanton et al.

We propose to apply a 2D CNN architecture to 3D MRI image Alzheimer's disease classification. Training a 3D convolutional neural network (CNN) is time-consuming and computationally expensive. We make use of approximate rank pooling to transform the 3D MRI image volume into a 2D image to use as input to a 2D CNN. We show our proposed CNN model achieves $9.5\%$ better Alzheimer's disease classification accuracy than the baseline 3D models. We also show that our method allows for efficient training, requiring only 20% of the training time compared to 3D CNN models. The code is available online: https://github.com/UkyVision/alzheimer-project.

MLMay 22, 2022
Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee

Youhui Ye, Meimei Liu, Xin Xing

Conformal prediction aims to determine precise levels of confidence in predictions for new objects using past experience. However, the commonly used exchangeable assumptions between the training data and testing data limit its usage in dealing with contaminated testing sets. In this paper, we develop a novel flow-based conformal inference (FCI) method to build predictive sets and infer outliers for complex and high-dimensional data. We leverage ideas from adversarial flow to transfer the input data to a random vector with known distributions. Our roundtrip transformation can map the input data to a low-dimensional space, meanwhile reserving the conditional distribution of input data given each class label, which enables us to construct a non-conformity score for uncertainty quantification. Our approach is applicable and robust when the testing data is contaminated. We evaluate our method, robust flow-based conformal inference, on benchmark datasets. We find that it produces effective predictive sets and accurate outlier detection and is more powerful relative to competing approaches.

CVDec 13, 2023
LD-SDM: Language-Driven Hierarchical Species Distribution Modeling

Srikumar Sastry, Xin Xing, Aayush Dhakal et al.

We focus on species distribution modeling using global-scale presence-only data, leveraging geographical and environmental features to map species ranges, as in previous studies. However, we innovate by integrating taxonomic classification into our approach. Specifically, we propose using a large language model to extract a latent representation of the taxonomic classification from a textual prompt. This allows us to map the range of any taxonomic rank, including unseen species, without additional supervision. We also present a new proximity-aware evaluation metric, suitable for evaluating species distribution models, which addresses critical shortcomings of traditional metrics. We evaluated our model for species range prediction, zero-shot prediction, and geo-feature regression and found that it outperforms several state-of-the-art models.

AIJul 25, 2025
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Haoran Lu, Luyang Fang, Ruidong Zhang et al.

Due to the remarkable capabilities and growing impact of large language models (LLMs), they have been deeply integrated into many aspects of society. Thus, ensuring their alignment with human values and intentions has emerged as a critical challenge. This survey provides a comprehensive overview of practical alignment techniques, training protocols, and empirical findings in LLM alignment. We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives. Our analysis shows that while supervised fine-tuning enables basic instruction-following, preference-based methods offer more flexibility for aligning with nuanced human intent. We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ), highlighting their approaches to balancing quality and efficiency. We review existing evaluation frameworks and benchmarking datasets, emphasizing limitations such as reward misspecification, distributional robustness, and scalable oversight. We summarize strategies adopted by leading AI labs to illustrate the current state of practice. We conclude by outlining open problems in oversight, value pluralism, robustness, and continuous alignment. This survey aims to inform both researchers and practitioners navigating the evolving landscape of LLM alignment.

MLOct 13, 2020
Neural Gaussian Mirror for Controlled Feature Selection in Neural Networks

Xin Xing, Yu Gui, Chenguang Dai et al.

Deep neural networks (DNNs) have become increasingly popular and achieved outstanding performance in predictive tasks. However, the DNN framework itself cannot inform the user which features are more or less relevant for making the prediction, which limits its applicability in many scientific fields. We introduce neural Gaussian mirrors (NGMs), in which mirrored features are created, via a structured perturbation based on a kernel-based conditional dependence measure, to help evaluate feature importance. We design two modifications of the DNN architecture for incorporating mirrored features and providing mirror statistics to measure feature importance. As shown in simulated and real data examples, the proposed method controls the feature selection error rate at a predefined level and maintains a high selection power even with the presence of highly correlated features.

CVFeb 27, 2020
Joint 2D-3D Breast Cancer Classification

Gongbo Liang, Xiaoqin Wang, Yu Zhang et al.

Breast cancer is the malignant tumor that causes the highest number of cancer deaths in females. Digital mammograms (DM or 2D mammogram) and digital breast tomosynthesis (DBT or 3D mammogram) are the two types of mammography imagery that are used in clinical practice for breast cancer detection and diagnosis. Radiologists usually read both imaging modalities in combination; however, existing computer-aided diagnosis tools are designed using only one imaging modality. Inspired by clinical practice, we propose an innovative convolutional neural network (CNN) architecture for breast cancer classification, which uses both 2D and 3D mammograms, simultaneously. Our experiment shows that the proposed method significantly improves the performance of breast cancer classification. By assembling three CNN classifiers, the proposed model achieves 0.97 AUC, which is 34.72% higher than the methods using only one imaging modality.

CVFeb 27, 2020
2D Convolutional Neural Networks for 3D Digital Breast Tomosynthesis Classification

Yu Zhang, Xiaoqin Wang, Hunter Blanton et al.

Automated methods for breast cancer detection have focused on 2D mammography and have largely ignored 3D digital breast tomosynthesis (DBT), which is frequently used in clinical practice. The two key challenges in developing automated methods for DBT classification are handling the variable number of slices and retaining slice-to-slice changes. We propose a novel deep 2D convolutional neural network (CNN) architecture for DBT classification that simultaneously overcomes both challenges. Our approach operates on the full volume, regardless of the number of slices, and allows the use of pre-trained 2D CNNs for feature extraction, which is important given the limited amount of annotated training data. In an extensive evaluation on a real-world clinical dataset, our approach achieves 0.854 auROC, which is 28.80% higher than approaches based on 3D CNNs. We also find that these improvements are stable across a range of model configurations.

STFeb 24, 2020
Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ping Ma, Xinlian Zhang, Xin Xing et al.

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.

MENov 6, 2019
Minimax Nonparametric Two-sample Test under Smoothing

Xin Xing, Zuofeng Shang, Pang Du et al.

We consider the problem of comparing probability densities between two groups. A new probabilistic tensor product smoothing spline framework is developed to model the joint density of two variables. Under such a framework, the probability density comparison is equivalent to testing the presence/absence of interactions. We propose a penalized likelihood ratio test for such interaction testing and show that the test statistic is asymptotically chi-square distributed under the null hypothesis. Furthermore, we derive a sharp minimax testing rate based on the Bernstein width for nonparametric two-sample tests and show that our proposed test statistics is minimax optimal. In addition, a data-adaptive tuning criterion is developed to choose the penalty parameter. Simulations and real applications demonstrate that the proposed test outperforms the conventional approaches under various scenarios.

LGJun 10, 2019
Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction

Wentao Ouyang, Xiuwu Zhang, Li Li et al.

Click-through rate (CTR) prediction is a critical task in online advertising systems. A large body of research considers each ad independently, but ignores its relationship to other ads that may impact the CTR. In this paper, we investigate various types of auxiliary ads for improving the CTR prediction of the target ad. In particular, we explore auxiliary ads from two viewpoints: one is from the spatial domain, where we consider the contextual ads shown above the target ad on the same page; the other is from the temporal domain, where we consider historically clicked and unclicked ads of the user. The intuitions are that ads shown together may influence each other, clicked ads reflect a user's preferences, and unclicked ads may indicate what a user dislikes to certain extent. In order to effectively utilize these auxiliary data, we propose the Deep Spatio-Temporal neural Networks (DSTNs) for CTR prediction. Our model is able to learn the interactions between each type of auxiliary data and the target ad, to emphasize more important hidden information, and to fuse heterogeneous data in a unified framework. Offline experiments on one public dataset and two industrial datasets show that DSTNs outperform several state-of-the-art methods for CTR prediction. We have deployed the best-performing DSTN in Shenma Search, which is the second largest search engine in China. The A/B test results show that the online CTR is also significantly improved compared to our last serving model.