Active Timepoint Selection for Learning Measure-Valued Trajectories

arXiv:2605.3062570.5h-index: 8

Predicted impact top 24% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work is significant for researchers in single-cell biology and other fields dealing with measure-valued trajectories, as it offers a method to reduce data acquisition costs while improving the fidelity of inferred paths. It's an incremental step in active learning, extending it to a new problem space.

The paper addresses the challenge of inferring continuous probability paths from sparse snapshots, particularly in single-cell biology where data acquisition is costly. It proposes an active learning framework that uses Linearized Optimal Transport to map distributional snapshots into a tangent space, enabling Gaussian Process modeling to construct a probabilistic surrogate for the underlying path. This framework allows for iterative selection of measurement times to minimize uncertainty, outperforming uncertainty-agnostic baselines on synthetic and real-world datasets.

Inferring continuous probability paths from sparse snapshots is a fundamental challenge in domains like single-cell biology, where high-fidelity data acquisition is often destructive and constrained by prohibitive sequencing costs. This motivates the need for active learning strategies to strategically select optimal measurement times. However, designing active learning policies for this setting remains an open problem: the target objects reside on the infinite dimensional Wasserstein space where standard Euclidean metrics are ill-defined, and current interpolation methods lack epistemic uncertainty quantification. We introduce a framework which extends active experimentation to the space of measures. By leveraging Linearized Optimal Transport (LOT), we map distributional snapshots into a tangent space amenable to Gaussian Process modeling, allowing us to construct a tractable probabilistic surrogate for the underlying probability path. This yields an acquisition policy that iteratively selects measurement times to minimize uncertainty. Empirical results demonstrate that our strategy outperforms uncertainty-agnostic baselines on both synthetic and real-world datasets.

View on arXiv PDF

Similar