Adaptations of AI models for querying the LandMatrix database in natural language
This work addresses the challenge of making complex land acquisition data more accessible for policymakers and stakeholders in low- and middle-income countries, though it is incremental as it applies existing AI techniques to a specific domain.
The paper tackled the problem of underutilization of the Land Matrix database in public policy due to technical access complexity by adapting Large Language Models (LLMs) with methods like Prompt Engineering, RAG, and Agents to enable natural language querying, resulting in simplified access to data from GraphQL and REST systems as demonstrated in reproducible experiments.
The Land Matrix initiative (https://landmatrix.org) and its global observatory aim to provide reliable data on large-scale land acquisitions to inform debates and actions in sectors such as agriculture, extraction, or energy in low- and middle-income countries. Although these data are recognized in the academic world, they remain underutilized in public policy, mainly due to the complexity of access and exploitation, which requires technical expertise and a good understanding of the database schema. The objective of this work is to simplify access to data from different database systems. The methods proposed in this article are evaluated using data from the Land Matrix. This work presents various comparisons of Large Language Models (LLMs) as well as combinations of LLM adaptations (Prompt Engineering, RAG, Agents) to query different database systems (GraphQL and REST queries). The experiments are reproducible, and a demonstration is available online: https://github.com/tetis-nlp/landmatrix-graphql-python.