Deep Learning and Natural Language Processing in the Field of Construction
This work addresses terminology and relationship extraction for construction professionals, but it is incremental as it applies existing NLP methods to a new domain.
The authors tackled the problem of extracting hypernym relationships in construction documents by first extracting domain terminology using statistical and linguistic methods, then detecting hypernyms using word embeddings. They reported relevant and promising results, with terminology evaluated by 6 experts and hypernym identification tested on multiple datasets.
This article presents a complete process to extract hypernym relationships in the field of construction using two main steps: terminology extraction and detection of hypernyms from these terms. We first describe the corpus analysis method to extract terminology from a collection of technical specifications in the field of construction. Using statistics and word n-grams analysis, we extract the domain's terminology and then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology. Second, we present a machine-learning approach based on various words embedding models and combinations to deal with the detection of hypernyms from the extracted terminology. Extracted terminology is evaluated using a manual evaluation carried out by 6 experts in the domain, and the hypernym identification method is evaluated with different datasets. The global approach provides relevant and promising results.