EqualizeIR: Mitigating Linguistic Biases in Retrieval Models
This addresses fairness and accuracy issues in IR systems for users with diverse linguistic query patterns, though it is incremental as it builds on existing bias mitigation methods.
The study tackled linguistic biases in information retrieval models, where performance varied with query complexity, and proposed EqualizeIR, a framework that reduced performance disparities and improved overall retrieval through regularization with biased weak learners.
This study finds that existing information retrieval (IR) models show significant biases based on the linguistic complexity of input queries, performing well on linguistically simpler (or more complex) queries while underperforming on linguistically more complex (or simpler) queries. To address this issue, we propose EqualizeIR, a framework to mitigate linguistic biases in IR models. EqualizeIR uses a linguistically biased weak learner to capture linguistic biases in IR datasets and then trains a robust model by regularizing and refining its predictions using the biased weak learner. This approach effectively prevents the robust model from overfitting to specific linguistic patterns in data. We propose four approaches for developing linguistically-biased models. Extensive experiments on several datasets show that our method reduces performance disparities across linguistically simple and complex queries, while improving overall retrieval performance.