LGMar 20
Revisit, Extend, and Enhance Hessian-Free Influence FunctionsZiao Yang, Han Yue, Jian Chen et al.
Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primarily due to the non-convex nature of the loss function and the large size of model parameters. This difficulty not only makes computing the inverse of the Hessian matrix costly but also renders it non-existent in some cases. Various approaches, including matrix decomposition, have been explored to expedite and approximate the inversion of the Hessian matrix, with the aim of making influence functions applicable to deep models. In this paper, we revisit a specific, albeit naive, yet effective approximation method known as TracIn. This method substitutes the inverse of the Hessian matrix with an identity matrix. We provide deeper insights into why this simple approximation method performs well. Furthermore, we extend its applications beyond measuring model utility to include considerations of fairness and robustness. Finally, we enhance TracIn through an ensemble strategy. To validate its effectiveness, we conduct experiments on synthetic data and extensive evaluations on noisy label detection, sample selection for large language model fine-tuning, and defense against adversarial attacks.
LGOct 14, 2025
Layer-Aware Influence for Online Data Valuation EstimationZiao Yang, Longbo Huang, Hongfu Liu
Data-centric learning emphasizes curating high-quality training samples to boost performance rather than designing new architectures. A central problem is to estimate the influence of training sample efficiently. Prior studies largely focus on static influence measured on a converged model, overlooking how data valuation dynamically changes during optimization. This omission neglects the dynamic nature of sample influence during optimization, especially in deep models. To address the computational burden of frequent influence estimation, we develop a layer-aware online estimator that requires only loss-to-output gradients. This design avoids parameter-level and full-network gradients while preserving ranking fidelity. Extensive experiments across LLM pretraining, fine-tuning, and image classification show our method improves accuracy with substantially lower time and memory cost, making dynamic data curation efficient and scalable in practice.
CVDec 2, 2021
PTCT: Patches with 3D-Temporal Convolutional Transformer Network for Precipitation NowcastingZiao Yang, Xiangrui Yang, Qifeng Lin
Precipitation nowcasting is to predict the future rainfall intensity over a short period of time, which mainly relies on the prediction of radar echo sequences. Though convolutional neural network (CNN) and recurrent neural network (RNN) are widely used to generate radar echo frames, they suffer from inductive bias (i.e., translation invariance and locality) and seriality, respectively. Recently, Transformer-based methods also gain much attention due to the great potential of Transformer structure, whereas short-term dependencies and autoregressive characteristic are ignored. In this paper, we propose a variant of Transformer named patches with 3D-temporal convolutional Transformer network (PTCT), where original frames are split into multiple patches to remove the constraint of inductive bias and 3D-temporal convolution is employed to capture short-term dependencies efficiently. After training, the inference of PTCT is performed in an autoregressive way to ensure the quality of generated radar echo frames. To validate our algorithm, we conduct experiments on two radar echo dataset: Radar Echo Guangzhou and HKO-7. The experimental results show that PTCT achieves state-of-the-art (SOTA) performance compared with existing methods.