Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling
This work addresses the challenge of causal inference in academic research for scholars and policymakers, though it is incremental as it adapts existing twin-based methods to a new context.
The authors tackled the problem of estimating the effects of decisions in the research process, such as paper titles or publication venues, by introducing a framework that uses pairs of papers that cite each other as 'twins' to approximate counterfactual outcomes, resulting in a method that leverages existing data to infer causal impacts without requiring unobservable counterfactuals.
The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfactual analysis using twins, where the researchers regard twins as counterfactual units. The proposed framework regards a pair of papers that cite each other as twins. Such papers tend to be parallel works, on similar topics, and in similar communities. We investigate twin papers that adopted different decisions, observe the progress of the research impact brought by these studies, and estimate the effect of decisions by the difference in the impacts of these studies. We release our code and data, which we believe are highly beneficial owing to the scarcity of the dataset on counterfactual studies.