LGAIOct 14, 2025

Layer-Aware Influence for Online Data Valuation Estimation

arXiv:2510.16007v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the computational burden of dynamic data valuation for practitioners in machine learning, enabling more efficient and scalable data curation.

The paper tackled the problem of efficiently estimating the dynamic influence of training samples during optimization in data-centric learning, and the result was a layer-aware online estimator that improves accuracy with substantially lower time and memory cost across various tasks.

Data-centric learning emphasizes curating high-quality training samples to boost performance rather than designing new architectures. A central problem is to estimate the influence of training sample efficiently. Prior studies largely focus on static influence measured on a converged model, overlooking how data valuation dynamically changes during optimization. This omission neglects the dynamic nature of sample influence during optimization, especially in deep models. To address the computational burden of frequent influence estimation, we develop a layer-aware online estimator that requires only loss-to-output gradients. This design avoids parameter-level and full-network gradients while preserving ranking fidelity. Extensive experiments across LLM pretraining, fine-tuning, and image classification show our method improves accuracy with substantially lower time and memory cost, making dynamic data curation efficient and scalable in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes