Towards a continuous modeling of natural language domains
This work addresses the challenge of defining and modeling overlapping domains in natural language for researchers in computational linguistics and AI.
The paper tackles the problem of modeling natural language domains as continuous rather than discrete, proposing representation learning-based models to adapt to continuous domains and using dialogue modeling as a test bed to investigate language variation.
Humans continuously adapt their style and language to a variety of domains. However, a reliable definition of `domain' has eluded researchers thus far. Additionally, the notion of discrete domains stands in contrast to the multiplicity of heterogeneous domains that humans navigate, many of which overlap. In order to better understand the change and variation of human language, we draw on research in domain adaptation and extend the notion of discrete domains to the continuous spectrum. We propose representation learning-based models that can adapt to continuous domains and detail how these can be used to investigate variation in language. To this end, we propose to use dialogue modeling as a test bed due to its proximity to language modeling and its social component.