SE AIOct 27, 2025

Evaluating the effectiveness of LLM-based interoperability

Rodrigo Falcão, Stefan Schweitzer, Julien Siebert, Emily Calvet, Frank Elberzhager

arXiv:2510.23893v13 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the economic and technical challenges of interoperability in dynamic systems, but it is incremental as it applies existing LLMs to a specific domain.

The study evaluated LLM-based strategies for enabling autonomous system interoperability at runtime, finding that qwen2.5-coder:32b achieved high effectiveness with average pass@1 scores up to 0.99 in most cases, though performance dropped to 0.75 in a dataset with unit conversions.

Background: Systems of systems are becoming increasingly dynamic and heterogeneous, and this adds pressure on the long-standing challenge of interoperability. Besides its technical aspect, interoperability has also an economic side, as development time efforts are required to build the interoperability artifacts. Objectives: With the recent advances in the field of large language models (LLMs), we aim at analyzing the effectiveness of LLM-based strategies to make systems interoperate autonomously, at runtime, without human intervention. Method: We selected 13 open source LLMs and curated four versions of a dataset in the agricultural interoperability use case. We performed three runs of each model with each version of the dataset, using two different strategies. Then we compared the effectiveness of the models and the consistency of their results across multiple runs. Results: qwen2.5-coder:32b was the most effective model using both strategies DIRECT (average pass@1 >= 0.99) and CODEGEN (average pass@1 >= 0.89) in three out of four dataset versions. In the fourth dataset version, which included an unit conversion, all models using the strategy DIRECT failed, whereas using CODEGEN qwen2.5-coder:32b succeeded with an average pass@1 = 0.75. Conclusion: Some LLMs can make systems interoperate autonomously. Further evaluation in different domains is recommended, and further research on reliability strategies should be conducted.

View on arXiv PDF

Similar