LGMLMay 6

Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

arXiv:2605.0551146.7h-index: 15
AI Analysis

This work addresses the problem of costly feature acquisition in prediction tasks, offering a more effective approach for sequential decision-making under uncertainty.

The paper introduces NM-PPG, a new method for active feature acquisition that uses pathwise policy gradients to optimize a non-myopic acquisition policy, achieving superior performance over state-of-the-art baselines on synthetic and real-world datasets.

Active feature acquisition (AFA) considers prediction problems in which features are costly to obtain and the learner adaptively decides which feature values to acquire for each instance and when to stop and predict. AFA can be formulated as a partially observable Markov decision process (POMDP), which naturally admits a sequential decision-making perspective. In this paper, we present non-myopic pathwise policy gradients (NM-PPG), a new AFA method built around this formulation. We introduce a continuous relaxation of the acquisition process that enables pathwise gradients through the full acquisition trajectory, avoiding the high variance of standard score-function policy gradients while allowing end-to-end optimization of a non-myopic acquisition policy. To better align training with deployment, we further develop a straight-through rollout scheme that follows hard feature acquisitions in the forward pass while backpropagating through the corresponding soft relaxation in the backward pass. We stabilize optimization with entropy regularization and staged temperature sharpening. Experiments on both synthetic and real-world datasets demonstrate that NM-PPG yields superior performance relative to state-of-the-art AFA baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes