IRNov 5, 2025Code
CLAX: Fast and Flexible Neural Click Models in JAXPhilipp Hager, Onno Zoeter, Maarten de Rijke
CLAX is a JAX-based library that implements classic click models using modern gradient-based optimization. While neural click models have emerged over the past decade, complex click models based on probabilistic graphical models (PGMs) have not systematically adopted gradient-based optimization, preventing practitioners from leveraging modern deep learning frameworks while preserving the interpretability of classic models. CLAX addresses this gap by replacing EM-based optimization with direct gradient-based optimization in a numerically stable manner. The framework's modular design enables the integration of any component, from embeddings and deep networks to custom modules, into classic click models for end-to-end optimization. We demonstrate CLAX's efficiency by running experiments on the full Baidu-ULTR dataset comprising over a billion user sessions in $\approx$ 2 hours on a single GPU, orders of magnitude faster than traditional EM approaches. CLAX implements ten classic click models, serving both industry practitioners seeking to understand user behavior and improve ranking performance at scale and researchers developing new click models. CLAX is available at: https://github.com/philipphager/clax
IRApr 3, 2024
Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetPhilipp Hager, Romain Deffayet, Jean-Michel Renders et al.
Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The Baidu-ULTR dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques. In this work, we revisit and extend the available experiments on the Baidu-ULTR dataset. We find that standard unbiased learning-to-rank techniques robustly improve click predictions but struggle to consistently improve ranking performance, especially considering the stark differences obtained by choice of ranking loss and query-document features. Our experiments reveal that gains in click prediction do not necessarily translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark.
AIJan 12, 2024
Foundations of Structural Causal Models with Latent SelectionLeihao Chen, Onno Zoeter, Joris M. Mooij
Three distinct phenomena complicate statistical causal analysis: latent common causes, causal cycles, and latent selection. Foundational works on Structural Causal Models (SCMs), e.g., Bongers et al. (2021, Ann. Stat., 49(5): 2885-2915), treat cycles and latent variables, while an analogous account of latent selection is missing. The goal of this article is to develop a theoretical foundation for modeling latent selection with SCMs. To achieve that, we introduce a conditioning operation for SCMs: it maps an SCM with explicit selection mechanisms to one without them while preserving the causal semantics of the selected subpopulation. Graphically, in Directed Mixed Graphs we extend bidirected edge--beyond latent common cause--to also encode latent selection. We prove that the conditioning operation preserves simplicity, acyclicity, and linearity of SCMs, and interacts well with marginalization, conditioning, and interventions. These properties make those three operations valuable tools for causal modeling, reasoning, and learning after abstracting away latent details (latent common causes and selection). Examples show how this abstraction streamlines analysis and clarifies when standard tools (e.g., adjustment, causal calculus, instrumental variables) remain valid under selection bias. We hope that these results deepen the SCM-based understanding of selection bias and become part of the standard causal modeling toolbox to build more reliable causal analysis.
LGMar 1, 2024
Evaluating and Correcting Performative Effects of Decision Support Systems via Causal Domain ShiftPhilip Boeken, Onno Zoeter, Joris M. Mooij
When predicting a target variable $Y$ from features $X$, the prediction $\hat{Y}$ can be performative: an agent might act on this prediction, affecting the value of $Y$ that we eventually observe. Performative predictions are deliberately prevalent in algorithmic decision support, where a Decision Support System (DSS) provides a prediction for an agent to affect the value of the target variable. When deploying a DSS in high-stakes settings (e.g. healthcare, law, predictive policing, or child welfare screening) it is imperative to carefully assess the performative effects of the DSS. In the case that the DSS serves as an alarm for a predicted negative outcome, naive retraining of the prediction model is bound to result in a model that underestimates the risk, due to effective workings of the previous model. In this work, we propose to model the deployment of a DSS as causal domain shift and provide novel cross-domain identification results for the conditional expectation $E[Y | X]$, allowing for pre- and post-hoc assessment of the deployment of the DSS, and for retraining of a model that assesses the risk under a baseline policy where the DSS is not deployed. Using a running example, we empirically show that a repeated regression procedure provides a practical framework for estimating these quantities, even when the data is affected by sample selection bias and selective labelling, offering for a practical, unified solution for multiple forms of target variable bias.
MLFeb 19, 2024
When Do Off-Policy and On-Policy Policy Gradient Methods Align?Davide Mambelli, Stephan Bongers, Onno Zoeter et al.
Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use remains limited to problems where fast and accurate simulations are available. A common way to improve sample efficiency is to modify their objective function to be computable from off-policy samples without importance sampling. A well-established off-policy objective is the excursion objective. This work studies the difference between the excursion objective and the traditional on-policy objective, which we refer to as the on-off gap. We provide the first theoretical analysis showing conditions to reduce the on-off gap while establishing empirical evidence of shortfalls arising when these conditions are not met.
IRJun 25, 2025
Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to RankPhilipp Hager, Onno Zoeter, Maarten de Rijke
Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.
GTJun 27, 2014
An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality AssuranceShweta Jain, Sujit Gujar, Satyanath Bhat et al.
Consider a requester who wishes to crowdsource a series of identical binary labeling tasks to a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and their costs are private. The problem is to select for each task an optimal subset of workers so that the outcome obtained from the selected workers guarantees a target accuracy level. The problem is a challenging one even in a non strategic setting since the accuracy of aggregated label depends on unknown qualities. We develop a novel multi-armed bandit (MAB) mechanism for solving this problem. First, we propose a framework, Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound on the number of time steps the algorithm chooses a sub-optimal set that depends on the target accuracy level and true qualities. A more challenging situation arises when the requester not only has to learn the qualities of the workers but also elicit their true costs. We modify the CCB-NS algorithm to obtain an adaptive exploration separated algorithm which we call { \em Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm produces an ex-post monotone allocation rule and thus can be transformed into an ex-post incentive compatible and ex-post individually rational mechanism that learns the qualities of the workers and guarantees a given target accuracy level in a cost optimal way. We provide a lower bound on the number of times any algorithm should select a sub-optimal set and we see that the lower bound matches our upper bound upto a constant factor. We provide insights on the practical implementation of this framework through an illustrative example and we show the efficacy of our algorithms through simulations.
GTFeb 14, 2012
Dynamic Mechanism Design for Markets with Strategic ResourcesSwaprava Nath, Onno Zoeter, Yadati Narahari et al.
The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.