CLMEApr 10, 2025

Geological Inference from Textual Data using Word Embeddings

arXiv:2504.07490v11 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of resource location for geologists and mining industries, but it is incremental as it builds on existing NLP methods with minor enhancements.

This research tackled the problem of locating geological resources like industrial minerals by applying NLP and word embeddings to geological texts, using dimensional reduction techniques to improve semantic relations, but the accuracy remained limited with results only in the same region as expected locations.

This research explores the use of Natural Language Processing (NLP) techniques to locate geological resources, with a specific focus on industrial minerals. By using word embeddings trained with the GloVe model, we extract semantic relationships between target keywords and a corpus of geological texts. The text is filtered to retain only words with geographical significance, such as city names, which are then ranked by their cosine similarity to the target keyword. Dimensional reduction techniques, including Principal Component Analysis (PCA), Autoencoder, Variational Autoencoder (VAE), and VAE with Long Short-Term Memory (VAE-LSTM), are applied to enhance feature extraction and improve the accuracy of semantic relations. For benchmarking, we calculate the proximity between the ten cities most semantically related to the target keyword and identified mine locations using the haversine equation. The results demonstrate that combining NLP with dimensional reduction techniques provides meaningful insights into the spatial distribution of natural resources. Although the result shows to be in the same region as the supposed location, the accuracy has room for improvement.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes