Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey
This survey clarifies a key distinction in systematic generalization for researchers in ML and AI, though it is incremental as it builds on prior taxonomies without introducing new models or data.
The paper distinguishes between behavioral and representational systematicity in ML models, analyzing how existing benchmarks primarily test behavioral systematicity and highlighting methods from mechanistic interpretability to assess representational systematicity.
A core aspect of compositionality, systematicity is a desirable property in ML models as it enables strong generalization to novel contexts. This has led to numerous studies proposing benchmarks to assess systematic generalization, as well as models and training regimes designed to enhance it. Many of these efforts are framed as addressing the challenge posed by Fodor and Pylyshyn. However, while they argue for systematicity of representations, existing benchmarks and models primarily focus on the systematicity of behaviour. We emphasize the crucial nature of this distinction. Furthermore, building on Hadley's (1994) taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested by key benchmarks in the literature across language and vision. Finally, we highlight ways of assessing systematicity of representations in ML models as practiced in the field of mechanistic interpretability.