CLNov 13, 2025

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction

arXiv:2511.10441v14.91 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the challenge of sample-efficient linguistic rule induction for lightweight models, offering a method that reduces data requirements compared to conventional approaches.

The paper tackled the problem of enabling lightweight models to learn linguistic rules with minimal data by using analogical paradigm organization, achieving an F1 score of 0.95 with only 100 examples, outperforming zero-shot GPT-o3 (F1=0.87).

Large language models achieve strong performance through training on vast datasets. Can analogical paradigm organization enable lightweight models to match this performance with minimal data? We develop a computational approach implementing three cognitive-inspired principles: analogical structure, contrastive learning, and minimal contextual cues. We test this approach with structured completion tasks where models identify correct sentence completions from analogical patterns with contrastive alternatives. Training lightweight models (BERT+CNN, $0.5M$ parameters) on only one hundred structured examples of English causative/inchoative alternations achieves $F1=0.95$, outperforming zero-shot \texttt{GPT-o3} ($F1=0.87$). Ablation studies confirm that analogical organization and contrastive structure improve performance, consistently surpassing randomly shuffled baselines across architectures. Cross-phenomenon validation using unspecified object alternations replicates these efficiency gains, confirming approach robustness. Our results show that analogical paradigm organization enables competitive linguistic rule learning with orders of magnitude less data than conventional approaches require.

View on arXiv PDF

Similar