CL AIJul 29, 2024

Do LLMs Really Adapt to Domains? An Ontology Learning Perspective

Huu Tan Mai, Cuong Xuan Chu, Heiko Paulheim

arXiv:2407.19998v16.630 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses a critical gap in understanding LLM capabilities for domain adaptation in ontology learning, which is important for researchers and practitioners in NLP and knowledge extraction, though it is incremental in testing existing methods.

The paper investigates whether large language models (LLMs) truly adapt to domain-specific data by reasoning over semantic relationships or merely rely on learned lexical senses, using a controlled experiment with WordNet and gibberish corpora. Results show that off-the-shelf LLMs fail to consistently reason in domain-specific contexts, but fine-tuning improves performance on lexical semantic tasks even with arbitrary terms.

Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL). However, it has not effectively been verified whether their success is due to their ability to reason over unstructured or semi-structured data, or their effective learning of linguistic patterns and senses alone. This unresolved question is particularly crucial when dealing with domain-specific data, where the lexical senses and their meaning can completely differ from what a LLM has learned during its training stage. This paper investigates the following question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning? To answer this question and, we devise a controlled experiment setup that uses WordNet to synthesize parallel corpora, with English and gibberish terms. We examine the differences in the outputs of LLMs for each corpus in two OL tasks: relation extraction and taxonomy discovery. Empirical results show that, while adapting to the gibberish corpora, off-the-shelf LLMs do not consistently reason over semantic relationships between concepts, and instead leverage senses and their frame. However, fine-tuning improves the performance of LLMs on lexical semantic tasks even when the domain-specific terms are arbitrary and unseen during pre-training, hinting at the applicability of pre-trained LLMs for OL.

View on arXiv PDF Code

Similar