LGSep 29, 2025

Model Correlation Detection via Random Selection Probing

Ruibo Chen, Sheng Zhang, Yihan Wu, Tong Zheng, Peihua Mai, Heng Huang

arXiv:2509.24171v12 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses the need for reliable model correlation detection in LLMs and VLMs, offering a principled method for transparent decisions, though it is incremental as it builds on existing similarity-based approaches.

The paper tackled the problem of detecting whether models are fine-tuned from or identical to each other, introducing Random Selection Probing (RSP) as a statistical framework that produces rigorous p-values, with experiments showing it yields small p-values for related models and high p-values for unrelated ones.

The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test. RSP optimizes textual or visual prefixes on a reference model for a random selection task and evaluates their transferability to a target model, producing rigorous p-values that quantify evidence of correlation. To mitigate false positives, RSP incorporates an unrelated baseline model to filter out generic, transferable features. We evaluate RSP across both LLMs and VLMs under diverse access conditions for reference models and test models. Experiments on fine-tuned and open-source models show that RSP consistently yields small p-values for related models while maintaining high p-values for unrelated ones. Extensive ablation studies further demonstrate the robustness of RSP. These results establish RSP as the first principled and general statistical framework for model correlation detection, enabling transparent and interpretable decisions in modern machine learning ecosystems.

View on arXiv PDF

Similar