Reply to "Emergent LLM behaviors are observationally equivalent to data leakage"
This work clarifies methodological issues for researchers studying multi-agent LLM systems, but it is incremental as it responds to a critique without introducing new findings.
The paper addresses concerns about data contamination in large language model (LLM) population simulations, arguing that it does not prevent the study of genuinely emergent dynamics, such as self-organization and social conventions, as observed in empirical cases.
A potential concern when simulating populations of large language models (LLMs) is data contamination, i.e. the possibility that training data may shape outcomes in unintended ways. While this concern is important and may hinder certain experiments with multi-agent models, it does not preclude the study of genuinely emergent dynamics in LLM populations. The recent critique by Barrie and Törnberg [1] of the results of Flint Ashery et al. [2] offers an opportunity to clarify that self-organisation and model-dependent emergent dynamics can be studied in LLM populations, highlighting how such dynamics have been empirically observed in the specific case of social conventions.