Generalization Boundaries of Fine-Tuned Small Language Models for Graph Structural Inference
For researchers using LLMs for graph reasoning, this provides empirical boundaries on generalization, showing reliability in ordinal ranking but with architecture-specific limits.
This work investigates the generalization of fine-tuned small language models for graph structural inference, finding that they maintain strong ordinal consistency across graph families and sizes beyond training, with architecture-specific degradation.
Small language models fine-tuned for graph property estimation have demonstrated strong in-distribution performance, yet their generalization capabilities beyond training conditions remain poorly understood. In this work, we systematically investigate the boundaries of structural inference in fine-tuned small language models along two generalization axes - graph size and graph family distribution - and assess domain-learning capability on real-world graph benchmarks. Using a controlled experimental setup with three instruction-tuned models in the 3-4B parameter class and two graph serialization formats, we evaluate performance on graphs substantially larger than the training range and across held-out random graph families. Our results show that fine-tuned models maintain strong ordinal consistency across structurally distinct graph families and continue to rank graphs by structural properties on inputs substantially larger than those seen during training, with distinct architecture-specific degradation profiles. These findings delineate where fine-tuned small language models generalize reliably, providing empirical grounding for their use in graph-based reasoning tasks.