CLAIDec 8, 2025

Metric-Fair Prompting: Treating Similar Samples Similarly

arXiv:2512.07608v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses fairness and accuracy issues in high-stakes clinical applications, but it is incremental as it builds on existing prompting and fairness methods.

The paper tackles the problem of ensuring individual fairness in large language models (LLMs) for medical question answering by introducing Metric-Fair Prompting, which enforces consistent decisions for similar questions and improves accuracy on the MedQA (US) benchmark.

We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label $+1$ (correct) or $-1$ (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in \emph{joint pairs of similar questions} rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each \((\text{question}, \text{option})\) to a score $f(x)$ that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes