Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
This work addresses geolocation and dialect analysis for applications like social media and linguistics, but it is incremental as it builds on existing neural and mixture density methods.
The authors tackled the problem of embedding geographic locations in a continuous vector space for geolocation and lexical dialectology, using a neural network with mixture density networks, and showed it outperforms conventional regression on Twitter data while providing better uncertainty estimates.
We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.