Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems
It addresses the problem of improving synthetic data practices for researchers and practitioners in recommender systems, but it is incremental as it builds on existing lessons without introducing new methods.
The paper argues that the emerging field of synthetic data for recommender systems should prioritize 'doing data right' by avoiding past mistakes like dataset bias and exploring new opportunities such as reproducibility and data minimization.
We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize `doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to dataset design and description will help to avoid past mistakes with dataset bias and evaluation. In order to fully exploit the opportunities of synthetic data, we point out that researchers can investigate new areas such as using data synthesize to support reproducibility by making data open, as well as FAIR, and to push forward our understanding of data minimization.