IRApr 7

Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

Wei Huang, Keping Bi, Yinqiong Cai, Wei Chen, Jiafeng Guo, Xueqi Cheng

arXiv:2604.0616366.3

Predicted impact top 42% in IR · last 90 daysOriginality Incremental advance

AI Analysis

This addresses fairness and reliability concerns in information access systems by identifying and mitigating data-driven bias, though it is incremental in refining existing understanding.

The paper tackled the problem of neural retrievers favoring LLM-generated texts over human-written ones, showing that this bias stems from supervision in training datasets rather than model flaws, and proposed methods that substantially reduce the bias.

Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.

View on arXiv PDF

Similar