Emergence of order in random languages
This work addresses theoretical issues in language modeling and statistical physics, but it appears incremental as it builds on existing models and methods.
The paper tackles the problem of understanding the behavior of large texts generated by weighted context-free grammars and ensembles like the Random Language Model, showing that in the information-carrying phase, replica symmetry must be broken.
We consider languages generated by weighted context-free grammars. It is shown that the behaviour of large texts is controlled by saddle-point equations for an appropriate generating function. We then consider ensembles of grammars, in particular the Random Language Model of E. DeGiuli, Phys. Rev. Lett., 122, 128301, 2019. This model is solved in the replica-symmetric ansatz, which is valid in the high-temperature, disordered phase. It is shown that in the phase in which languages carry information, the replica symmetry must be broken.