Automatic Identification of Research Fields in Scientific Papers
This work addresses the need for researchers and institutions to efficiently categorize and retrieve geographically relevant scientific papers, though it appears incremental as it combines existing NLP and text mining methods.
The TERRE-ISTEX project tackled the problem of identifying scientific research related to specific geographical territories by analyzing heterogeneous digital content from papers, developing a web-based geographical information retrieval tool that integrates spatial, thematic, and temporal dimensions to enhance understanding of research topics and coverage.
The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identification of the themes which appear in these documents, and (3) development of a web-based geographical information retrieval tool (GIR). The first two actions combine Natural Language Processing patterns with text mining methods. The integration of the spatial, thematic and temporal dimensions in a GIR contributes to a better understanding of what kind of research has been carried out, of its topics and its geographical and historical coverage. Another originality of the TERRE-ISTEX project is the heterogeneous character of the corpus, including PhD theses and scientific articles from the ISTEX digital libraries and the CIRAD research center.