LGApr 23, 2025

Exploring How LLMs Capture and Represent Domain-Specific Knowledge

Mirian Hipolito Garcia, Camille Couturier, Daniel Madrigal Diaz, Ankur Mallick, Anastasios Kyrillidis, Robert Sim, Victor Ruhle, Saravan Rajmohan

arXiv:2504.16871v29.43 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the problem of understanding and optimizing domain-specific knowledge representation in LLMs for researchers and practitioners, though it is incremental as it builds on existing probing and model selection methods.

The study investigated whether Large Language Models (LLMs) inherently capture domain-specific nuances by analyzing their ability to distinguish queries from different domains using hidden states, revealing latent domain-related trajectories and showing that fine-tuned models are not always the most accurate.

We study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains. We also study the robustness of these domain representations to variations in prompt styles and sources. Our approach leverages these representations for model selection, mapping the LLM that best matches the domain trace of the input query (i.e., the model with the highest performance on similar traces). Our findings show that LLMs can differentiate queries for related domains, and that the fine-tuned model is not always the most accurate. Unlike previous work, our interpretations apply to both closed and open-ended generative tasks

View on arXiv PDF

Similar