DL AI IRDec 11, 2024

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

Tanay Aggarwal, Angelo Salatino, Francesco Osborne, Enrico Motta

arXiv:2412.08258v27.39 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient, up-to-date ontologies in scientific fields like engineering, though it is incremental as it builds on existing LLM capabilities for a specific task.

This paper tackled the problem of automating scholarly ontology generation by evaluating large language models (LLMs) in identifying semantic relationships between research topics, achieving high F1-scores up to 0.967 with models like Claude 3 Sonnet and showing that optimized smaller models can match larger proprietary ones.

Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. This paper offers a comprehensive analysis of the ability of large language models (LLMs) to identify semantic relationships between different research topics, which is a critical step in the development of such ontologies. To this end, we developed a gold standard based on the IEEE Thesaurus to evaluate the task of identifying four types of relationships between pairs of topics: broader, narrower, same-as, and other. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can deliver performance comparable to much larger proprietary models, while requiring significantly fewer computational resources.

View on arXiv PDF Code

Similar