CLJul 13, 2023
Retrieval Augmented Generation using Engineering Design KnowledgeL. Siddharth, Jianxi Luo
Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts - {head entity :: relationship :: tail entity} from patented artefact descriptions. Given a sentence with a pair of entities (based on noun phrases) marked in a unique manner, our method extracts the relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification) and elicitation (sequence-to-sequence). The token classification approach achieves up to 99.7 % accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.
CLDec 11, 2023
Linguistic and Structural Basis of Engineering Design KnowledgeL. Siddharth, Jianxi Luo
Natural language artefact descriptions are primary carriers of engineering design knowledge, whose retrieval, representation, and reuse are fundamental to supporting knowledge-intensive tasks in the design process. In this paper, we explicate design knowledge from patented artefact descriptions as knowledge graphs and examine these to understand the linguistic and structural basis. The purpose of our work is to advance the traditional and ontological perspectives of design knowledge and to guide Large-Language Models (LLMs) on how to articulate natural language responses that reflect knowledge that is valuable in a design environment. We populate 33,881 knowledge graphs from a sample of patents stratified according to technology classes. For linguistic basis, we conduct Zipf distribution analyses on the frequencies of unique entities and relationships to identify 64 and 37 generalisable linguistic syntaxes respectively. The relationships largely represent attributes ('of'), structure ('in', 'with'), purpose ('to', 'for'), hierarchy ('include'), exemplification ('such as'), and behaviour ('to', 'from'). For structural basis, we draw inspiration from various studies on biological/ecological networks and discover motifs from patent knowledge graphs. We identify four 3-node and four 4-node subgraph patterns that could be converged and simplified into sequence [->...->], aggregation [->...<-], and hierarchy [<-...->]. Based on these results, we suggest concretisation strategies for entities and relationships and explicating hierarchical structures, potentially aiding the construction and modularisation of design knowledge.