LGAIIRMay 10

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

arXiv:2605.1101720.2
AI Analysis

For researchers and practitioners modeling user behavior from aggregate data, this work reveals a systematic distortion that invalidates standard parametric fits, with implications for recommendation, advertising, and clinical dosing.

Aggregation in behavioral curve modeling introduces Simpson's paradox, distorting parametric estimates of user engagement dynamics. On Goodreads, individual users peak at ~11 exposures while the aggregate peaks at ~34 (3x gap); Amazon Electronics shows a 5.3x distortion, driven by survival bias.

Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion: Simpson's paradox in behavioral curves. On Goodreads (3.3M users, 9 genres), individual users peak at n* approximately 11 exposures while the aggregate peaks at n* approximately 34 -- a 3x gap driven by survival bias. Amazon Electronics (18M reviews) shows a 5.3x distortion. MovieLens-25M (D approximately 1) serves as a negative control, confirming that survival bias -- not aggregation per se -- is the operative mechanism. The distortion is robust to category granularity, engagement operationalization, and classifier calibration. We develop Synthetic Null Calibration to address a 32% false positive rate in per-user classification. Our findings apply wherever individual behavioral parameters are estimated from aggregate curves under differential attrition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes