ME LGJun 11, 2020

The Limits to Learning a Diffusion Model

Jackie Baek, Vivek F. Farias, Andreea Georgescu, Retsef Levi, Tianyi Peng, Deeksha Sinha, Joshua Wilde, Andrew Zheng

arXiv:2006.06373v34.33 citations

Originality Incremental advance

AI Analysis

It addresses the challenge of forecasting in domains like consumer adoption and epidemics, highlighting limitations in model learning and the need for additional data, but is incremental as it formalizes known practical difficulties with theoretical bounds.

This paper establishes the first sample complexity lower bounds for estimating simple diffusion models like Bass and SIR, showing that accurate prediction of outcomes (e.g., customer adoptions or infections) is impossible until late in the diffusion process, specifically until at least two-thirds of the way to peak rates.

This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models, including the Bass model (used in modeling consumer adoption) and the SIR model (used in modeling epidemics). We show that one cannot hope to learn such models until quite late in the diffusion. Specifically, we show that the time required to collect a number of observations that exceeds our sample complexity lower bounds is large. For Bass models with low innovation rates, our results imply that one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. This lower bound in estimation further translates into a lower bound in regret for decision-making in epidemic interventions. Our results formalize the challenge of accurate forecasting and highlight the importance of incorporating additional data sources. To this end, we analyze the benefit of a seroprevalence study in an epidemic, where we characterize the size of the study needed to improve SIR model estimation. Extensive empirical analyses on product adoption and epidemic data support our theoretical findings.

View on arXiv PDF

Similar