Natural language processing for achieving sustainable development: the case of neural labelling to enhance community profiling
This addresses data gaps in sustainable development for communities in developing countries, though it is incremental in applying existing methods to a new domain.
The paper tackles the lack of NLP applications in sustainable development by proposing automatic UPV classification for community profiling in developing countries, releasing the Stories2Insights dataset and neural baselines, but finds the problem challenging with room for future research.
In recent years, there has been an increasing interest in the application of Artificial Intelligence - and especially Machine Learning - to the field of Sustainable Development (SD). However, until now, NLP has not been applied in this context. In this research paper, we show the high potential of NLP applications to enhance the sustainability of projects. In particular, we focus on the case of community profiling in developing countries, where, in contrast to the developed world, a notable data gap exists. In this context, NLP could help to address the cost and time barrier of structuring qualitative data that prohibits its widespread use and associated benefits. We propose the new task of Automatic UPV classification, which is an extreme multi-class multi-label classification problem. We release Stories2Insights, an expert-annotated dataset, provide a detailed corpus analysis, and implement a number of strong neural baselines to address the task. Experimental results show that the problem is challenging, and leave plenty of room for future research at the intersection of NLP and SD.