Methodology for identifying study sites in scientific corpus
This project addresses the challenge of analyzing heterogeneous scientific data for researchers and institutions, but it appears incremental as it builds on existing NLP and text mining methods.
The TERRE-ISTEX project tackles the problem of identifying study sites, disciplinary crossings, and research methods from scientific corpora by developing a web-based geographical information retrieval tool that integrates spatial, thematic, and temporal dimensions, with experiments conducted on a corpus of electronic theses and articles from ISTEX and CIRAD.
The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publications resulting from the corpus analyzed, (2) to identify the thematics addressed in these works and (3) to develop a web-based geographical information retrieval tool (GIR). The first two actions involve approaches combining Natural languages processing patterns with text mining methods. By crossing the three dimensions (spatial, thematic and temporal) in a GIR engine, it will be possible to understand what research has been carried out on which territories and at what time. In the project, the experiments are carried out on a heterogeneous corpus including electronic thesis and scientific articles from the ISTEX digital libraries and the CIRAD research center.