Language-independence of DisCoCirc's Text Circuits: English and Urdu
This work addresses language independence in computational linguistics, but it is incremental as it builds on prior DisCoCirc developments and focuses on restricted fragments.
The paper tackled the problem of grammatical differences between languages by applying the DisCoCirc framework to English and Urdu, showing that differences in word and phrase ordering vanish in the resulting circuits.
DisCoCirc is a newly proposed framework for representing the grammar and semantics of texts using compositional, generative circuits. While it constitutes a development of the Categorical Distributional Compositional (DisCoCat) framework, it exposes radically new features. In particular, [14] suggested that DisCoCirc goes some way toward eliminating grammatical differences between languages. In this paper we provide a sketch that this is indeed the case for restricted fragments of English and Urdu. We first develop DisCoCirc for a fragment of Urdu, as it was done for English in [14]. There is a simple translation from English grammar to Urdu grammar, and vice versa. We then show that differences in grammatical structure between English and Urdu - primarily relating to the ordering of words and phrases - vanish when passing to DisCoCirc circuits.