MLLGNov 5, 2024

Online Data Collection for Efficient Semiparametric Inference

arXiv:2411.03195v1h-index: 58Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient data collection for semiparametric inference, which is incremental as it builds on existing statistical fusion methods by introducing online decision-making.

The paper tackles the problem of sequentially collecting data from multiple sources under budget constraints to efficiently estimate a target parameter, showing that their online policies achieve zero regret in asymptotic mean squared error and outperform fixed allocation methods in synthetic and real-world causal effect tasks.

While many works have studied statistical data fusion, they typically assume that the various datasets are given in advance. However, in practice, estimation requires difficult data collection decisions like determining the available data sources, their costs, and how many samples to collect from each source. Moreover, this process is often sequential because the data collected at a given time can improve collection decisions in the future. In our setup, given access to multiple data sources and budget constraints, the agent must sequentially decide which data source to query to efficiently estimate a target parameter. We formalize this task using Online Moment Selection, a semiparametric framework that applies to any parameter identified by a set of moment conditions. Interestingly, the optimal budget allocation depends on the (unknown) true parameters. We present two online data collection policies, Explore-then-Commit and Explore-then-Greedy, that use the parameter estimates at a given time to optimally allocate the remaining budget in the future steps. We prove that both policies achieve zero regret (assessed by asymptotic MSE) relative to an oracle policy. We empirically validate our methods on both synthetic and real-world causal effect estimation tasks, demonstrating that the online data collection policies outperform their fixed counterparts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes