Duncan J. Watts

h-index70

5papers

205citations

Novelty51%

AI Score44

Ranked #50,246 of 194,257 authors (top 26%)#46 in SI (top 14%)

5 Papers

6.1SIMay 29

The Effect of Mobility Trajectory Sparsity on Epidemic Modeling Outcomes

Federico Delussu, Francisco Barreras, Yuan Liao et al.

GPS mobility data are increasingly used in epidemic modeling, allowing the construction of co-location networks or population flows. These trajectories typically exhibit high temporal sparsity because data collection is opportunistic and tied to phone use. Despite growing awareness of this limitation, the analysis and treatment of biases derived from it have been largely overlooked in existing epidemic modeling studies, raising concerns about the robustness of downstream inferences. We introduce a principled framework to quantify the impact of trajectory sparsity on key epidemic modeling outcomes across different levels of missingness. Our approach leverages a highly-complete dataset that exhibits both near-complete and sparse GPS trajectories. Near-complete trajectories provide baseline epidemic outcomes, while sparse trajectories provide realistic missingness patterns that we impose on the baseline to measure bias. In this way, we show how missing records can result in substantial underestimation of key measures of epidemic intensity, explained not only by the amount of missing data, but by more complex features of data missingness that should be taken into account when designing correction methods. Finally, we propose and evaluate a correction based on inverse probability weighting of network edges before epidemic model calibration, which is shown to reduce bias and parameter misspecification. We also demonstrate this correction on a separate anonymized sample from a commercial GPS mobility dataset and report on its effect. Together, our findings provide a first rigorous quantification of trajectory-sparsity bias in epidemic modeling, offering initial guidance on the treatment of this issue.

8.8LGNov 30, 2023

Pre-registration for Predictive Modeling

Jake M. Hofman, Angelos Chatzimparmpas, Amit Sharma et al.

Amid rising concerns of reproducibility and generalizability in predictive modeling, we explore the possibility and potential benefits of introducing pre-registration to the field. Despite notable advancements in predictive modeling, spanning core machine learning tasks to various scientific applications, challenges such as overlooked contextual factors, data-dependent decision-making, and unintentional re-use of test data have raised questions about the integrity of results. To address these issues, we propose adapting pre-registration practices from explanatory modeling to predictive modeling. We discuss current best practices in predictive modeling and their limitations, introduce a lightweight pre-registration template, and present a qualitative study with machine learning researchers to gain insight into the effectiveness of pre-registration in preventing biased estimates and promoting more reliable research outcomes. We conclude by exploring the scope of problems that pre-registration can address in predictive modeling and acknowledging its limitations within this context.

9.6AIMay 15, 2025

Empirically evaluating commonsense intelligence in large language models with large-scale human judgments

Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting

Commonsense intelligence in machines is often assessed by static benchmarks that compare a model's output against human-prescribed correct labels. An important, albeit implicit, assumption of these labels is that they accurately capture what any human would think, effectively treating human common sense as homogeneous. However, recent empirical work has shown that humans vary enormously in what they consider commonsensical; thus what appears self-evident to one benchmark designer may not be so to another. Here, we propose a method for evaluating common sense in artificial intelligence (AI), specifically in large language models (LLMs), that incorporates empirically observed heterogeneity among humans by measuring the correspondence between a model's judgment and that of a human population. We first find that, when treated as independent survey respondents, most LLMs remain below the human median in their individual commonsense competence. Second, when used as simulators of a hypothetical population, LLMs correlate with real humans only modestly in the extent to which they agree on the same set of statements. In both cases, smaller, open-weight models are surprisingly more competitive than larger, proprietary frontier models. Our evaluation framework, which ties commonsense intelligence to its cultural basis, contributes to the growing call for adapting AI models to human collectivities that possess different, often incompatible, social stocks of knowledge.

13.4SINov 25, 2020

Examining the consumption of radical content on YouTube

Homa Hosseinmardi, Amir Ghasemian, Aaron Clauset et al.

Although it is under-studied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's scale has fueled concerns that YouTube users are being radicalized via a combination of biased recommendations and ostensibly apolitical anti-woke channels, both of which have been claimed to direct attention to radical political content. Here we test this hypothesis using a representative panel of more than 300,000 Americans and their individual-level browsing behavior, on and off YouTube, from January 2016 through December 2019. Using a labeled set of political news channels, we find that news consumption on YouTube is dominated by mainstream and largely centrist sources. Consumers of far-right content, while more engaged than average, represent a small and stable percentage of news consumers. However, consumption of anti-woke content, defined in terms of its opposition to progressive intellectual and political agendas, grew steadily in popularity and is correlated with consumption of far-right content off-platform. We find no evidence that engagement with far-right content is caused by YouTube recommendations systematically, nor do we find clear evidence that anti-woke channels serve as a gateway to the far right. Rather, consumption of political content on YouTube appears to reflect individual preferences that extend across the web as a whole.

6.6MENov 28, 2016Code

Split-door criterion: Identification of causal effects through auxiliary outcomes

Amit Sharma, Jake M. Hofman, Duncan J. Watts

We present a method for estimating causal effects in time series data when fine-grained information about the outcome of interest is available. Specifically, we examine what we call the split-door setting, where the outcome variable can be split into two parts: one that is potentially affected by the cause being studied and another that is independent of it, with both parts sharing the same (unobserved) confounders. We show that under these conditions, the problem of identification reduces to that of testing for independence among observed variables, and present a method that uses this approach to automatically find subsets of the data that are causally identified. We demonstrate the method by estimating the causal impact of Amazon's recommender system on traffic to product pages, finding thousands of examples within the dataset that satisfy the split-door criterion. Unlike past studies based on natural experiments that were limited to a single product category, our method applies to a large and representative sample of products viewed on the site. In line with previous work, we find that the widely-used click-through rate (CTR) metric overestimates the causal impact of recommender systems; depending on the product category, we estimate that 50-80\% of the traffic attributed to recommender systems would have happened even without any recommendations. We conclude with guidelines for using the split-door criterion as well as a discussion of other contexts where the method can be applied.