Yaxin Li

CV
h-index16
22papers
1,248citations
Novelty41%
AI Score49

22 Papers

IRMay 29
UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

Hanyu Li, Yi-Ping Hsu, Aditya Mantha et al. · stanford

Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.

AIOct 10, 2023Code
Exploring Memorization in Fine-tuned Language Models

Shenglai Zeng, Yaxin Li, Jie Ren et al.

Large language models (LLMs) have shown great capabilities in various tasks but also exhibited memorization of training data, raising tremendous privacy and copyright concerns. While prior works have studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared to pre-training, fine-tuning typically involves more sensitive data and diverse objectives, thus may bring distinct privacy risks and unique memorization behaviors. In this work, we conduct the first comprehensive analysis to explore language models' (LMs) memorization during fine-tuning across tasks. Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks. We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.

CVJun 22, 2023Code
3D Reconstruction of Spherical Images based on Incremental Structure from Motion

San Jiang, Kan You, Yaxin Li et al.

3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scenes, is challenging due to the frequent tracking failures between camera frames and high data collection costs. Recently, spherical images have been extensively exploited due to the capability of recording surrounding environments from one camera exposure. Classical 3D reconstruction pipelines, however, cannot be used for spherical images. Besides, there exist few software packages for 3D reconstruction of spherical images. Based on the imaging geometry of spherical cameras, this study investigates the algorithms for the relative orientation using spherical correspondences, absolute orientation using 3D correspondences between scene and spherical points, and the cost functions for BA (bundle adjustment) optimization. In addition, an incremental SfM (Structure from Motion) workflow has been proposed for spherical images using the above-mentioned algorithms. The proposed solution is finally verified by using three spherical datasets captured by both consumer-grade and professional spherical cameras. The results demonstrate that the proposed SfM workflow can achieve the successful 3D reconstruction of complex scenes and provide useful clues for the implementation in open-source software packages. The source code of the designed SfM workflow would be made publicly available.

NEApr 12, 2023
Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation

Qi Xu, Yaxin Li, Jiangrong Shen et al.

Spiking neural networks (SNNs) are well known as the brain-inspired models with high computing efficiency, due to a key component that they utilize spikes as information units, close to the biological neural systems. Although spiking based models are energy efficient by taking advantage of discrete spike signals, their performance is limited by current network structures and their training methods. As discrete signals, typical SNNs cannot apply the gradient descent rules directly into parameters adjustment as artificial neural networks (ANNs). Aiming at this limitation, here we propose a novel method of constructing deep SNN models with knowledge distillation (KD) that uses ANN as teacher model and SNN as student model. Through ANN-SNN joint training algorithm, the student SNN model can learn rich feature information from the teacher ANN model through the KD method, yet it avoids training SNN from scratch when communicating with non-differentiable spikes. Our method can not only build a more efficient deep spiking structure feasibly and reasonably, but use few time steps to train whole model compared to direct training or ANN to SNN methods. More importantly, it has a superb ability of noise immunity for various types of artificial noises and natural signals. The proposed novel method provides efficient ways to improve the performance of SNN through constructing deeper structures in a high-throughput fashion, with potential usage for light and efficient brain-inspired computing of practical scenarios.

NEApr 19, 2023
Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks

Qi Xu, Yaxin Li, Xuanye Fang et al.

Spiking neural networks (SNNs) have superb characteristics in sensory information recognition tasks due to their biological plausibility. However, the performance of some current spiking-based models is limited by their structures which means either fully connected or too-deep structures bring too much redundancy. This redundancy from both connection and neurons is one of the key factors hindering the practical application of SNNs. Although Some pruning methods were proposed to tackle this problem, they normally ignored the fact the neural topology in the human brain could be adjusted dynamically. Inspired by this, this paper proposed an evolutionary-based structure construction method for constructing more reasonable SNNs. By integrating the knowledge distillation and connection pruning method, the synaptic connections in SNNs can be optimized dynamically to reach an optimal state. As a result, the structure of SNNs could not only absorb knowledge from the teacher model but also search for deep but sparse network topology. Experimental results on CIFAR100 and DVS-Gesture show that the proposed structure learning method can get pretty well performance while reducing the connection redundancy. The proposed method explores a novel dynamical way for structure learning from scratch in SNNs which could build a bridge to close the gap between deep learning and bio-inspired neural dynamics.

CVMay 2, 2022
Enhancing Adversarial Training with Feature Separability

Yaxin Li, Xiaorui Liu, Han Xu et al.

Deep Neural Network (DNN) are vulnerable to adversarial attacks. As a countermeasure, adversarial training aims to achieve robustness based on the min-max optimization problem and it has shown to be one of the most effective defense strategies. However, in this work, we found that compared with natural training, adversarial training fails to learn better feature representations for either clean or adversarial samples, which can be one reason why adversarial training tends to have severe overfitting issues and less satisfied generalize performance. Specifically, we observe two major shortcomings of the features learned by existing adversarial training methods:(1) low intra-class feature similarity; and (2) conservative inter-classes feature variance. To overcome these shortcomings, we introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to coherently boost the intra-class feature similarity and increase inter-class feature variance. Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.

CVFeb 9, 2023
3D reconstruction from spherical images: A review of techniques, applications, and prospects

San Jiang, Yaxin Li, Duojie Weng et al.

3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scenes, is challenging due to frequent tracking failures between camera frames and high data collection costs. Recently, spherical images have been extensively used due to the capability of recording surrounding environments from one camera exposure. In contrast to perspective images with limited FOV (Field of View), spherical images can cover the whole scene with full horizontal and vertical FOV and facilitate camera tracking and data acquisition in these complex scenes. With the rapid evolution and extensive use of professional and consumer-grade spherical cameras, spherical images show great potential for the 3D modeling of urban and indoor scenes. Classical 3D reconstruction pipelines, however, cannot be directly used for spherical images. Besides, there exist few software packages that are designed for the 3D reconstruction of spherical images. As a result, this research provides a thorough survey of the state-of-the-art for 3D reconstruction of spherical images in terms of data acquisition, feature detection and matching, image orientation, and dense matching as well as presenting promising applications and discussing potential prospects. We anticipate that this study offers insightful clues to direct future research.

CVMar 17, 2024Code
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

Recent advancements in text-to-image diffusion models have demonstrated their remarkable capability to generate high-quality images from textual prompts. However, increasing research indicates that these models memorize and replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks. In our study, we provide a novel perspective to understand this memorization phenomenon by examining its relationship with cross-attention mechanisms. We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens. The diffusion model is overfitted to these token embeddings, memorizing corresponding training images. To elucidate this phenomenon, we further identify and discuss various intrinsic findings of cross-attention that contribute to memorization. Building on these insights, we introduce an innovative approach to detect and mitigate memorization in diffusion models. The advantage of our proposed method is that it will not compromise the speed of either the training or the inference processes in these models while preserving the quality of generated images. Our code is available at https://github.com/renjie3/MemAttn .

LGDec 17, 2025
Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Huiyan Xue, Xuming Ran, Yaxin Li et al.

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.

IVAug 13, 2024
Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction

Yu Guo, Caiying Wu, Yaxin Li et al.

Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algorithm. Furthermore, we leverage deep learning to initialize the conjugate gradient method, resulting in a deep unrolling network with theoretical guarantees. Our extensive numerical experiments demonstrate that our proposed algorithm surpasses existing methods, particularly excelling in fewer scanned views and complex noise conditions.

LGJul 5, 2021Code
Elastic Graph Neural Networks

Xiaorui Liu, Wei Jin, Yao Ma et al.

While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at \url{https://github.com/lxiaorui/ElasticGNN}.

LGMay 13, 2020Code
DeepRobust: A PyTorch Library for Adversarial Attacks and Defenses

Yaxin Li, Wei Jin, Han Xu et al.

DeepRobust is a PyTorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. It currently contains more than 10 attack algorithms and 8 defense algorithms in image domain and 9 attack algorithms and 4 defense algorithms in graph domain, under a variety of deep learning architectures. In this manual, we introduce the main contents of DeepRobust with detailed instructions. The library is kept updated and can be found at https://github.com/DSE-MSU/DeepRobust.

LGMar 2, 2020Code
Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical Studies

Wei Jin, Yaxin Li, Han Xu et al.

Deep neural networks (DNNs) have achieved significant performance in various tasks. However, recent studies have shown that DNNs can be easily fooled by small perturbation on the input, called adversarial attacks. As the extensions of DNNs to graphs, Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability. Adversary can mislead GNNs to give wrong predictions by modifying the graph structure such as manipulating a few edges. This vulnerability has arisen tremendous concerns for adapting GNNs in safety-critical applications and has attracted increasing research attention in recent years. Thus, it is necessary and timely to provide a comprehensive overview of existing graph adversarial attacks and the countermeasures. In this survey, we categorize existing attacks and defenses, and review the corresponding state-of-the-art methods. Furthermore, we have developed a repository with representative algorithms (https://github.com/DSE-MSU/DeepRobust/tree/master/deeprobust/graph). The repository enables us to conduct empirical studies to deepen our understandings on attacks and defenses on graphs.

CLApr 3, 2025
Legal Mathematical Reasoning with LLMs: Procedural Alignment through Two-Stage Reinforcement Learning

Kepu Zhang, Guofu Xie, Weijie Yu et al.

Legal mathematical reasoning is essential for applying large language models (LLMs) in high-stakes legal contexts, where outputs must be both mathematically accurate and procedurally compliant. However, existing legal LLMs lack structured numerical reasoning, and open-domain models, though capable of calculations, often overlook mandatory legal steps. To address this, we present LexNum, the first Chinese legal mathematical reasoning benchmark, covering three representative scenarios where each instance reflects legally grounded procedural flows. We further propose LexPam, a two-stage reinforcement learning framework for efficient legal reasoning training. Leveraging curriculum learning, we use a stronger teacher model to partition data into basic and challenging subsets. A lightweight 1.5B student model is then fine-tuned with Group Relative Policy Optimization, which avoids costly value networks and enables stable training from sparse, end-of-sequence rewards. The first stage improves accuracy and format; the second introduces a novel reward to guide procedural alignment via task-specific legal elements. Experiments show that existing models perform poorly on LexNum, while LexPam enhances both mathematical accuracy and legal coherence, and generalizes effectively across tasks and domains.

NEJun 3, 2024
Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning

Yaxin Li, Qi Xu, Jiangrong Shen et al.

The emergence of deep and large-scale spiking neural networks (SNNs) exhibiting high performance across diverse complex datasets has led to a need for compressing network models due to the presence of a significant number of redundant structural units, aiming to more effectively leverage their low-power consumption and biological interpretability advantages. Currently, most model compression techniques for SNNs are based on unstructured pruning of individual connections, which requires specific hardware support. Hence, we propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework. Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task. While maintaining model performance, this approach refines the network architecture, ultimately reducing computational load and accelerating the inference process. This indicates that structured dynamic sparse learning methods can better facilitate the application of deep SNNs in low-power and high-efficiency scenarios.

CVAug 2, 2021
Wood-leaf classification of tree point cloud based on intensity and geometrical information

Jingqian Sun, Pei Wang, Zhiyong Gao et al.

Terrestrial laser scanning (TLS) can obtain tree point cloud with high precision and high density. Efficient classification of wood points and leaf points is essential to study tree structural parameters and ecological characteristics. By using both the intensity and spatial information, a three-step classification and verification method was proposed to achieve automated wood-leaf classification. Tree point cloud was classified into wood points and leaf points by using intensity threshold, neighborhood density and voxelization successively. Experiment was carried in Haidian Park, Beijing, and 24 trees were scanned by using the RIEGL VZ-400 scanner. The tree point clouds were processed by using the proposed method, whose classification results were compared with the manual classification results which were used as standard results. To evaluate the classification accuracy, three indicators were used in the experiment, which are Overall Accuracy (OA), Kappa coefficient (Kappa) and Matthews correlation coefficient (MCC). The ranges of OA, Kappa and MCC of the proposed method are from 0.9167 to 0.9872, from 0.7276 to 0.9191, and from 0.7544 to 0.9211 respectively. The average values of OA, Kappa and MCC are 0.9550, 0.8547 and 0.8627 respectively. Time cost of wood-leaf classification was also recorded to evaluate the algorithm efficiency. The average processing time are 1.4 seconds per million points. The results showed that the proposed method performed well automatically and quickly on wood-leaf classification based on the experimental dataset.

LGJul 28, 2021
Imbalanced Adversarial Training with Reweighting

Wentao Wang, Han Xu, Xiaorui Liu et al.

Adversarial training has been empirically proven to be one of the most effective and reliable defense methods against adversarial attacks. However, almost all existing studies about adversarial training are focused on balanced datasets, where each class has an equal amount of training examples. Research on adversarial training with imbalanced training datasets is rather limited. As the initial effort to investigate this problem, we reveal the facts that adversarially trained models present two distinguished behaviors from naturally trained models in imbalanced datasets: (1) Compared to natural training, adversarially trained models can suffer much worse performance on under-represented classes, when the training dataset is extremely imbalanced. (2) Traditional reweighting strategies may lose efficacy to deal with the imbalance issue for adversarial training. For example, upweighting the under-represented classes will drastically hurt the model's performance on well-represented classes, and as a result, finding an optimal reweighting value can be tremendously challenging. In this paper, to further understand our observations, we theoretically show that the poor data separability is one key reason causing this strong tension between under-represented and well-represented classes. Motivated by this finding, we propose Separable Reweighted Adversarial Training (SRAT) to facilitate adversarial training under imbalanced scenarios, by learning more separable features for different classes. Extensive experiments on various datasets verify the effectiveness of the proposed framework.

AIJul 12, 2021
Trustworthy AI: A Computational Perspective

Haochen Liu, Yiqi Wang, Wenqi Fan et al.

In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.

CVDec 6, 2020
Automatic sampling and training method for wood-leaf classification based on tree terrestrial point cloud

Zichu Liu, Qing Zhang, Pei Wang et al.

Terrestrial laser scanning technology provides an efficient and accuracy solution for acquiring three-dimensional information of plants. The leaf-wood classification of plant point cloud data is a fundamental step for some forestry and biological research. An automatic sampling and training method for classification was proposed based on tree point cloud data. The plane fitting method was used for selecting leaf sample points and wood sample points automatically, then two local features were calculated for training and classification by using support vector machine (SVM) algorithm. The point cloud data of ten trees were tested by using the proposed method and a manual selection method. The average correct classification rate and kappa coefficient are 0.9305 and 0.7904, respectively. The results show that the proposed method had better efficiency and accuracy comparing to the manual selection method.

LGOct 13, 2020
To be Robust or to be Fair: Towards Fairness in Adversarial Training

Han Xu, Xiaorui Liu, Yaxin Li et al.

Adversarial training algorithms have been proved to be reliable to improve machine learning models' robustness against adversarial examples. However, we find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. For instance, a PGD adversarially trained ResNet18 model on CIFAR-10 has 93% clean accuracy and 67% PGD l-infty-8 robust accuracy on the class "automobile" but only 65% and 17% on the class "cat". This phenomenon happens in balanced datasets and does not exist in naturally trained models when only using clean samples. In this work, we empirically and theoretically show that this phenomenon can happen under general adversarial training algorithms which minimize DNN models' robust errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses. Experimental results validate the effectiveness of FRL.

LGSep 2, 2020
Yet Meta Learning Can Adapt Fast, It Can Also Break Easily

Han Xu, Yaxin Li, Xiaorui Liu et al.

Meta learning algorithms have been widely applied in many tasks for efficient learning, such as few-shot image classification and fast reinforcement learning. During meta training, the meta learner develops a common learning strategy, or experience, from a variety of learning tasks. Therefore, during meta test, the meta learner can use the learned strategy to quickly adapt to new tasks even with a few training samples. However, there is still a dark side about meta learning in terms of reliability and robustness. In particular, is meta learning vulnerable to adversarial attacks? In other words, would a well-trained meta learner utilize its learned experience to build wrong or likely useless knowledge, if an adversary unnoticeably manipulates the given training set? Without the understanding of this problem, it is extremely risky to apply meta learning in safety-critical applications. Thus, in this paper, we perform the initial study about adversarial attacks on meta learning under the few-shot classification problem. In particular, we formally define key elements of adversarial attacks unique to meta learning and propose the first attacking algorithm against meta learning under various settings. We evaluate the effectiveness of the proposed attacking strategy as well as the robustness of several representative meta learning algorithms. Experimental results demonstrate that the proposed attacking strategy can easily break the meta learner and meta learning is vulnerable to adversarial attacks. The implementation of the proposed framework will be released upon the acceptance of this paper.

CVJan 30, 2020
Automatic marker-free registration of tree point-cloud data based on rotating projection

Xiuxian Xu, Pei Wang, Xiaozheng Gan et al.

Point-cloud data acquired using a terrestrial laser scanner (TLS) play an important role in digital forestry research. Multiple scans are generally used to overcome occlusion effects and obtain complete tree structural information. However, it is time-consuming and difficult to place artificial reflectors in a forest with complex terrain for marker-based registration, a process that reduces registration automation and efficiency. In this study, we propose an automatic coarse-to-fine method for the registration of point-cloud data from multiple scans of a single tree. In coarse registration, point clouds produced by each scan are projected onto a spherical surface to generate a series of two-dimensional (2D) images, which are used to estimate the initial positions of multiple scans. Corresponding feature-point pairs are then extracted from these series of 2D images. In fine registration, point-cloud data slicing and fitting methods are used to extract corresponding central stem and branch centers for use as tie points to calculate fine transformation parameters. To evaluate the accuracy of registration results, we propose a model of error evaluation via calculating the distances between center points from corresponding branches in adjacent scans. For accurate evaluation, we conducted experiments on two simulated trees and a real-world tree. Average registration errors of the proposed method were 0.26m around on simulated tree point clouds, and 0.05m around on real-world tree point cloud.