How the cascade inference problem distorts information diffusion
For researchers studying information diffusion on social media, this work highlights that standard platform-provided data and naive reconstruction methods can lead to severely biased conclusions.
The paper shows that ignoring cascade inference problems distorts analyses of social influence, community detection, and information diffusion, using case studies on Twitter and Bluesky. Analysis of over 40,000 news stories reveals that reconstruction assumptions drastically distort both microscopic and macroscopic cascade properties.
To analyze the flow of information online, experts often rely on platform-provided data from social media companies, which typically attribute all resharing actions to an original poster. This obscures the true dynamics of how information spreads online, as users can be exposed to content in various ways. While most researchers analyze data as it is provided by the platform and overlook this issue, some attempt to infer the structure of information cascades. However, the absence of ground truth about actual diffusion cascades makes it impossible to verify the efficacy of these efforts. We propose a novel parametric reconstruction approach and use it to investigate how overlooking cascade reconstruction distorts analyses of social influence, community detection, and information diffusion. Two case studies involving data from Twitter and Bluesky reveal that cascade inference significantly impacts the identification of both influential users and communities, therefore affecting downstream analyses in general. Analysis of the diffusion of over 40,000 true and false news stories on Twitter reveals that the assumptions made during the reconstruction procedure drastically distort both microscopic and macroscopic properties of cascade networks. This work highlights the challenges of studying information spreading processes on complex networks and has significant implications for the broader study of digital platforms.