CLDBIRJul 13, 2023

Retrieval Augmented Generation using Engineering Design Knowledge

arXiv:2307.06985v1010 citationsh-index: 34
Originality Synthesis-oriented
AI Analysis

This work addresses the need for structured knowledge retrieval in engineering design, though it is incremental as it applies existing RAG methods to a new domain-specific dataset.

The paper tackled the problem of extracting explicit engineering design facts from patent descriptions to support Retrieval Augmented Generation (RAG) in design processes, achieving up to 99.7% accuracy in relation identification and populating a knowledge base with over 2.93 million facts from fan system patents.

Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts - {head entity :: relationship :: tail entity} from patented artefact descriptions. Given a sentence with a pair of entities (based on noun phrases) marked in a unique manner, our method extracts the relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification) and elicitation (sequence-to-sequence). The token classification approach achieves up to 99.7 % accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes