Krikamol Muandet

LG
h-index66
65papers
4,150citations
Novelty55%
AI Score59

65 Papers

LGDec 13, 2022
On the Relationship Between Explanation and Prediction: A Causal View

Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith et al. · eth-zurich

Being able to provide explanations for a model's decision has become a central requirement for the development, deployment, and adoption of machine learning models. However, we are yet to understand what explanation methods can and cannot do. How do upstream factors such as data, model prediction, hyperparameters, and random initialization influence downstream explanations? While previous work raised concerns that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys. Our results suggest that the relationships between E and Y is far from ideal. In fact, the gap between 'ideal' case only increase in higher-performing models -- models that are likely to be deployed. Our work is a promising first step towards providing a quantitative measure of the relationship between E and Y, which could also inform the future development of methods for E with a quantitative metric.

LGOct 26, 2023Code
Looping in the Human Collaborative and Explainable Bayesian Optimization

Masaki Adachi, Brady Planden, David A. Howey et al. · oxford

Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods. Code is available https://github.com/ma921/CoExBO.

LGJun 24, 2022
Gated Domain Units for Multi-source Domain Generalization

Simon Föll, Alina Dubatovka, Eugen Ernst et al. · oxford

The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary Distributions (I.E.D) across different domains. This assumption implies an invariant structure in the solution space that enables knowledge transfer to unseen domains. To exploit this property for domain generalization, we introduce a modular neural network layer consisting of Gated Domain Units (GDUs) that learn a representation for each latent elementary distribution. During inference, a weighted ensemble of learning machines can be created by comparing new observations with the representations of each elementary distribution. Our flexible framework also accommodates scenarios where explicit domain information is not present. Extensive experiments on image, text, and graph data show consistent performance improvement on out-of-training target domains. These findings support the practicality of the I.E.D assumption and the effectiveness of GDUs for domain generalisation.

LGJun 17, 2022
AutoML Two-Sample Test

Jonas M. Kübler, Vincent Stimper, Simon Buchholz et al.

Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.

LGApr 18Code
Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration

Chongsheng Zhang, Hao Wang, Zelong Yu et al.

Imbalanced data is commonly present in real-world applications. While data synthesis can effectively mitigate the data scarcity problem of rare-classes, and LLMs have revolutionized text generation, the application of LLMs to relational/structured tabular data synthesis remains underexplored. Moreover, existing approaches lack an effective feedback mechanism that can guide LLMs towards continuously optimizing the quality of the generated data throughout the synthesis process. In this work, we propose RDDG, Relational Data generator with Dynamic Guidance, which is a unified in-context learning framework that employs progressive chain-of-thought (CoT) steps to generate tabular data for enhancing downstream imbalanced classification performance. RDDG first uses core set selection to identify representative samples from the original data, then utilizes in-context learning to discover the inherent patterns and correlations among attributes within the core set, and subsequently generates tabular data while preserving the aforementioned constraints. More importantly, it incorporates a self-reinforcing feedback mechanism that provides automatic assessments on the quality of the generated data, enabling continuous quality optimization throughout the generation process. Experimental results on multiple real and synthetic datasets demonstrate that RDDG outperforms existing approaches in both data fidelity and downstream imbalanced classification performance. We make our code available at https://github.com/cszhangLMU/RDDG.

LGJul 11, 2022
Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions

Heiner Kremer, Jia-Jie Zhu, Krikamol Muandet et al.

Important problems in causal inference, economics, and, more generally, robust machine learning can be expressed as conditional moment restrictions, but estimation becomes challenging as it requires solving a continuum of unconditional moment restrictions. Previous works addressed this problem by extending the generalized method of moments (GMM) to continuum moment restrictions. In contrast, generalized empirical likelihood (GEL) provides a more general framework and has been shown to enjoy favorable small-sample properties compared to GMM-based estimators. To benefit from recent developments in machine learning, we provide a functional reformulation of GEL in which arbitrary models can be leveraged. Motivated by a dual formulation of the resulting infinite dimensional optimization problem, we devise a practical method and explore its asymptotic properties. Finally, we provide kernel- and neural network-based implementations of the estimator, which achieve state-of-the-art empirical performance on two conditional moment restriction problems.

AIAug 30, 2023
Causal Strategic Learning with Competitive Selection

Kiet Q. H. Vo, Muneeb Aadil, Siu Lun Chau et al. · oxford

We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decision maker unilaterally selects agents by maximising their own utility, we show that the optimal selection rule is a trade-off between selecting the best agents and providing incentives to maximise the agents' improvement. Furthermore, this optimal selection rule relies on incorrect predictions of agents' outcomes. Hence, we study the conditions under which a decision maker's optimal selection rule will not lead to deterioration of agents' outcome nor cause unjust reduction in agents' selection chance. To that end, we provide an analytical form of the optimal selection rule and a mechanism to retrieve the causal parameters from observational data, under certain assumptions on agents' behaviour. Secondly, when there are multiple decision makers, the interference between selection rules introduces another source of biases in estimating the underlying causal parameters. To address this problem, we provide a cooperative protocol which all decision makers must collectively adopt to recover the true causal parameters. Lastly, we complement our theoretical results with simulation studies. Our results highlight not only the importance of causal modeling as a strategy to mitigate the effect of gaming, as suggested by previous work, but also the need of a benevolent regulator to enable it.

AIMar 28
Quantification of Credal Uncertainty: A Distance-Based Approach

Xabier Gonzalez-Garcia, Siu Lun Chau, Julian Rodemann et al. · oxford

Credal sets, i.e., closed convex sets of probability measures, provide a natural framework to represent aleatoric and epistemic uncertainty in machine learning. Yet how to quantify these two types of uncertainty for a given credal set, particularly in multiclass classification, remains underexplored. In this paper, we propose a distance-based approach to quantify total, aleatoric, and epistemic uncertainty for credal sets. Concretely, we introduce a family of such measures within the framework of Integral Probability Metrics (IPMs). The resulting quantities admit clear semantic interpretations, satisfy natural theoretical desiderata, and remain computationally tractable for common choices of IPMs. We instantiate the framework with the total variation distance and obtain simple, efficient uncertainty measures for multiclass classification. In the binary case, this choice recovers established uncertainty measures, for which a principled multiclass generalization has so far been missing. Empirical results confirm practical usefulness, with favorable performance at low computational cost.

MLMar 2
Instrumental and Proximal Causal Inference with Gaussian Processes

Yuqi Zhang, Krikamol Muandet, Dino Sejdinovic et al. · oxford

Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide reliable epistemic uncertainty (EU) quantification. We address this gap through a Deconditional Gaussian Process (DGP) framework for uncertainty-aware causal learning. Our formulation recovers popular kernel estimators as the posterior mean, ensuring predictive precision, while the posterior variance yields principled and well-calibrated EU. Moreover, the probabilistic structure enables systematic model selection via marginal log-likelihood optimization. Empirical results demonstrate strong predictive performance alongside informative EU quantification, evaluated via empirical coverage frequencies and decision-aware accuracy rejection curves. Together, our approach provides a unified, practical solution for causal inference under unobserved confounding with reliable uncertainty.

LGJul 20, 2022
Learning Counterfactually Invariant Predictors

Francesco Quinzan, Cecilia Casolo, Krikamol Muandet et al.

Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.

LGJun 5, 2022
(Im)possibility of Collective Intelligence

Krikamol Muandet

Modern applications of AI involve training and deploying machine learning models across heterogeneous and potentially massive environments. Emerging diversity of data not only brings about new possibilities to advance AI systems, but also restricts the extent to which information can be shared across environments due to pressing concerns such as privacy, security, and equity. Based on a novel characterization of learning algorithms as choice correspondences on a hypothesis space, this work provides a minimum requirement in terms of intuitive and reasonable axioms under which the only rational learning algorithm in heterogeneous environments is an empirical risk minimization (ERM) that unilaterally learns from a single environment without information sharing across environments. Our (im)possibility result underscores the fundamental trade-off that any algorithms will face in order to achieve Collective Intelligence (CI), i.e., the ability to learn across heterogeneous environments. Ultimately, collective learning in heterogeneous environments are inherently hard because, in critical areas of machine learning such as out-of-distribution generalization, federated/collaborative learning, algorithmic fairness, and multi-modal learning, it can be infeasible to make meaningful comparisons of model predictive performance across environments.

LGJul 21, 2023
Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet et al.

Test-time defenses are used to improve the robustness of deep neural networks to adversarial examples during inference. However, existing methods either require an additional trained classifier to detect and correct the adversarial samples, or perform additional complex optimization on the model parameters or the input to adapt to the adversarial samples at test-time, resulting in a significant increase in the inference time compared to the base model. In this work, we propose a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model. Our extensive experiments on CIFAR-10, CIFAR-100, tiny ImageNet and ImageNet datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench show that RFI improves robustness across adaptive and transfer attacks consistently. We also compare RFI with adaptive test-time defenses to demonstrate the effectiveness of our proposed approach.

LGOct 24, 2022
Sufficient Invariant Learning for Distribution Shift

Taero Kim, Subeen Park, Sungjun Lim et al.

Learning robust models under distribution shifts between training and test datasets is a fundamental challenge in machine learning. While learning invariant features across environments is a popular approach, it often assumes that these features are fully observed in both training and test sets, a condition frequently violated in practice. When models rely on invariant features absent in the test set, their robustness in new environments can deteriorate. To tackle this problem, we introduce a novel learning principle called the Sufficient Invariant Learning (SIL) framework, which focuses on learning a sufficient subset of invariant features rather than relying on a single feature. After demonstrating the limitation of existing invariant learning methods, we propose a new algorithm, Adaptive Sharpness-aware Group Distributionally Robust Optimization (ASGDRO), to learn diverse invariant features by seeking common flat minima across the environments. We theoretically demonstrate that finding a common flat minima enables robust predictions based on diverse invariant features. Empirical evaluations on multiple datasets, including our new benchmark, confirm ASGDRO's robustness against distribution shifts, highlighting the limitations of existing methods.

LGMay 7
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature

Majid Mohammadi, Grigory Reznikov, Pavel Sinitcyn et al.

We study the efficient computation of Shapley values for \emph{product games} -- cooperative games in which the coalition value factorizes as a product of per-player terms. Such games arise in machine learning explainability whenever the value function inherits a multiplicative structure from the underlying model, as in kernel methods with product kernels and tree-based models. Our key result is that the Shapley value of each player in a product game admits an exact one-dimensional integral representation: the weighted sum over exponentially many feature coalitions collapses to the integral of a degree-$(d-1)$ polynomial over $[0,1]$, where $d$ is the total number of features. This yields a Gauss--Legendre quadrature scheme that is \emph{provably exact} whenever the number of nodes satisfies $m_q \geq \lceil d/2 \rceil$, and otherwise provides a \emph{near-exact} approximation with error provably decaying geometrically in $m_q$. In practice, a few hundred nodes can achieve highly precise estimates even with thousands of features. Building on this formulation, we derive a numerically stable implementation via log-space evaluation, together with an efficient parallel implementation based on associative scan primitives that achieves $O(d\,m_q)$ total work and $O(\log d)$ parallel time. Experiments show that \textsc{QuadraSHAP} is the fastest numerically stable method across all tested configurations.

MLFeb 4
Performative Learning Theory

Julian Rodemann, Unai Fischer-Abaigar, James Bailie et al.

Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.

AIMar 11
Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Anita Yang, Krikamol Muandet, Michele Caprio et al.

Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This mismatch leads to systematic failure modes, particularly in settings that involve ambiguous question-answering, in-context learning, and self-reflection. To address this, we propose novel prompt-based uncertainty elicitation techniques grounded in \emph{imprecise probabilities}, a principled framework for repesenting and eliciting higher-order uncertainty. Here, first-order uncertainty captures uncertainty over possible responses to a prompt, while second-order uncertainty (uncertainty about uncertainty) quantifies indeterminacy in the underlying probability model itself. We introduce general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty, and demonstrate their effectiveness across diverse settings. Our approach enables more faithful uncertainty reporting from LLMs, improving credibility and supporting downstream decision-making.

LGApr 6, 2024
Domain Generalisation via Imprecise Learning

Anurag Singh, Siu Lun Chau, Shahine Bouabid et al. · oxford

Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation.

MLOct 16, 2024
Credal Two-Sample Tests of Epistemic Uncertainty

Siu Lun Chau, Antonin Schrab, Arthur Gretton et al. · oxford

We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller's partial ignorance. Compared to classical two-sample tests, which focus on comparing precise distributions, the proposed framework provides a broader and more versatile set of hypotheses. This approach enables the direct integration of epistemic uncertainty, effectively addressing the challenges arising from partial ignorance in hypothesis testing. By generalising two-sample test to compare credal sets, our framework enables reasoning for equality, inclusion, intersection, and mutual exclusivity, each offering unique insights into the modeller's epistemic beliefs. As the first work on nonparametric hypothesis testing for comparing credal sets, we focus on finitely generated credal sets derived from i.i.d. samples from multiple distributions -- referred to as credal samples. We formalise these tests as two-sample tests with nuisance parameters and introduce the first permutation-based solution for this class of problems, significantly improving existing methods. Our approach properly incorporates the modeller's epistemic uncertainty into hypothesis testing, leading to more robust and credible conclusions, with kernel-based implementations for real-world applications.

MLMay 22, 2025
Integral Imprecise Probability Metrics

Siu Lun Chau, Michele Caprio, Krikamol Muandet · oxford

Quantifying differences between probability distributions is fundamental to statistics and machine learning, primarily for comparing statistical uncertainty. In contrast, epistemic uncertainty (EU) -- due to incomplete knowledge -- requires richer representations than those offered by classical probability. Imprecise probability (IP) theory offers such models, capturing ambiguity and partial belief. This has driven growing interest in imprecise probabilistic machine learning (IPML), where inference and decision-making rely on broader uncertainty models -- highlighting the need for metrics beyond classical probability. This work introduces the Integral Imprecise Probability Metric (IIPM) framework, a Choquet integral-based generalisation of classical Integral Probability Metric (IPM) to the setting of capacities -- a broad class of IP models encompassing many existing ones, including lower probabilities, probability intervals, belief functions, and more. Theoretically, we establish conditions under which IIPM serves as a valid metric and metrises a form of weak convergence of capacities. Practically, IIPM not only enables comparison across different IP models but also supports the quantification of epistemic uncertainty within a single IP model. In particular, by comparing an IP model with its conjugate, IIPM gives rise to a new class of EU measures -- Maximum Mean Imprecision -- which satisfy key axiomatic properties proposed in the Uncertainty Quantification literature. We validate MMI through selective classification experiments, demonstrating strong empirical performance against established EU measures, and outperforming them when classical methods struggle to scale to a large number of classes. Our work advances both theory and practice in IPML, offering a principled framework for comparing and quantifying epistemic uncertainty under imprecision.

LGMay 22, 2025
Computing Exact Shapley Values in Polynomial Time for Product-Kernel Methods

Majid Mohammadi, Siu Lun Chau, Krikamol Muandet · oxford

Kernel methods are widely used in machine learning due to their flexibility and expressiveness. However, their black-box nature poses significant challenges to interpretability, limiting their adoption in high-stakes applications. Shapley value-based feature attribution techniques, such as SHAP and kernel method-specific adaptation like RKHS-SHAP, offer a promising path toward explainability. Yet, computing exact Shapley values is generally intractable, leading existing methods to rely on approximations and thereby incur unavoidable error. In this work, we introduce PKeX-Shapley, a novel algorithm that utilizes the multiplicative structure of product kernels to enable the exact computation of Shapley values in polynomial time. The core of our approach is a new value function, the functional baseline value function, specifically designed for product-kernel models. This value function removes the influence of a feature subset by setting its functional component to the least informative state. Crucially, it allows a recursive thus efficient computation of Shapley values in polynomial time. As an important additional contribution, we show that our framework extends beyond predictive modeling to statistical inference. In particular, it generalizes to popular kernel-based discrepancy measures such as the Maximum Mean Discrepancy (MMD) and the Hilbert-Schmidt Independence Criterion (HSIC), thereby providing new tools for interpretable statistical inference.

LGMar 20, 2025
Truthful Elicitation of Imprecise Forecasts

Anurag Singh, Siu Lun Chau, Krikamol Muandet · oxford

The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts -- forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster's epistemic uncertainty into their decision-making process, thus improving credibility.

MAFeb 11, 2025
Bayesian Optimization for Building Social-Influence-Free Consensus

Masaki Adachi, Siu Lun Chau, Wenjie Xu et al. · oxford

We introduce Social Bayesian Optimization (SBO), a vote-efficient algorithm for consensus-building in collective decision-making. In contrast to single-agent scenarios, collective decision-making encompasses group dynamics that may distort agents' preference feedback, thereby impeding their capacity to achieve a social-influence-free consensus -- the most preferable decision based on the aggregated agent utilities. We demonstrate that under mild rationality axioms, reaching social-influence-free consensus using noisy feedback alone is impossible. To address this, SBO employs a dual voting system: cheap but noisy public votes (e.g., show of hands in a meeting), and more accurate, though expensive, private votes (e.g., one-to-one interview). We model social influence using an unknown social graph and leverage the dual voting system to efficiently learn this graph. Our theoretical findigns show that social graph estimation converges faster than the black-box estimation of agents' utilities, allowing us to reduce reliance on costly private votes early in the process. This enables efficient consensus-building primarily through noisy public votes, which are debiased based on the estimated social graph to infer social-influence-free feedback. We validate the efficacy of SBO across multiple real-world applications, including thermal comfort, team building, travel negotiation, and energy trading collaboration.

LGAug 20, 2025
Exact Shapley Attributions in Quadratic-time for FANOVA Gaussian Processes

Majid Mohammadi, Krikamol Muandet, Ilaria Tiddi et al. · oxford

Shapley values are widely recognized as a principled method for attributing importance to input features in machine learning. However, the exact computation of Shapley values scales exponentially with the number of features, severely limiting the practical application of this powerful approach. The challenge is further compounded when the predictive model is probabilistic - as in Gaussian processes (GPs) - where the outputs are random variables rather than point estimates, necessitating additional computational effort in modeling higher-order moments. In this work, we demonstrate that for an important class of GPs known as FANOVA GP, which explicitly models all main effects and interactions, *exact* Shapley attributions for both local and global explanations can be computed in *quadratic time*. For local, instance-wise explanations, we define a stochastic cooperative game over function components and compute the exact stochastic Shapley value in quadratic time only, capturing both the expected contribution and uncertainty. For global explanations, we introduce a deterministic, variance-based value function and compute exact Shapley values that quantify each feature's contribution to the model's overall sensitivity. Our methods leverage a closed-form (stochastic) Möbius representation of the FANOVA decomposition and introduce recursive algorithms, inspired by Newton's identities, to efficiently compute the mean and variance of Shapley values. Our work enhances the utility of explainable AI, as demonstrated by empirical studies, by providing more scalable, axiomatically sound, and uncertainty-aware explanations for predictions generated by structured probabilistic models.

LGMay 27, 2025
When Shift Happens - Confounding Is to Blame

Abbavaram Gowtham Reddy, Celia Rubio-Madrigal, Rebekka Burkholz et al.

Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-of-distribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates, not just causal ones, are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies.

LGMar 5
Incentive Aware AI Regulations: A Credal Characterisation

Anurag Singh, Julian Rodemann, Rajeev Verma et al.

While high-stakes ML applications demand strict regulations, strategic ML providers often evade them to lower development costs. To address this challenge, we cast AI regulation as a mechanism design problem under uncertainty and introduce regulation mechanisms: a framework that maps empirical evidence from models to a license for some market share. The providers can select from a set of licenses, effectively forcing them to bet on their model's ability to fulfil regulation. We aim at regulation mechanisms that achieve perfect market outcome, i.e. (a) drive non-compliant providers to self-exclude, and (b) ensure participation from compliant providers. We prove that a mechanism has perfect market outcome if and only if the set of non-compliant distributions forms a credal set, i.e., a closed, convex set of probability measures. This result connects mechanism design and imprecise probability by establishing a duality between regulation mechanisms and the set of non-compliant distributions. We also demonstrate these mechanisms in practice via experiments on regulating use of spurious features for prediction and fairness. Our framework provides new insights at the intersection of mechanism design and imprecise probability, offering a foundation for development of enforceable AI regulations.

LGOct 29, 2025
An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation

Uzair Akbar, Niki Kilbertus, Hao Shen et al.

The technique of data augmentation (DA) is often used in machine learning for regularization purposes to better generalize under i.i.d. settings. In this work, we present a unifying framework with topics in causal inference to make a case for the use of DA beyond just the i.i.d. setting, but for generalization across interventions as well. Specifically, we argue that when the outcome generating mechanism is invariant to our choice of DA, then such augmentations can effectively be thought of as interventions on the treatment generating mechanism itself. This can potentially help to reduce bias in causal effect estimation arising from hidden confounders. In the presence of such unobserved confounding we typically make use of instrumental variables (IVs) -- sources of treatment randomization that are conditionally independent of the outcome. However, IVs may not be as readily available as DA for many applications, which is the main motivation behind this work. By appropriately regularizing IV based estimators, we introduce the concept of IV-like (IVL) regression for mitigating confounding bias and improving predictive performance across interventions even when certain IV properties are relaxed. Finally, we cast parameterized DA as an IVL regression problem and show that when used in composition can simulate a worst-case application of such DA, further improving performance on causal estimation and generalization tasks beyond what simple DA may offer. This is shown both theoretically for the population case and via simulation experiments for the finite sample case using a simple linear example. We also present real data experiments to support our case.

LGOct 6, 2025
When Do Credal Sets Stabilize? Fixed-Point Theorems for Credal Set Updates

Michele Caprio, Siu Lun Chau, Krikamol Muandet · oxford

Many machine learning algorithms rely on iterative updates of uncertainty representations, ranging from variational inference and expectation-maximization, to reinforcement learning, continual learning, and multi-agent learning. In the presence of imprecision and ambiguity, credal sets -- closed, convex sets of probability distributions -- have emerged as a popular framework for representing imprecise probabilistic beliefs. Under such imprecision, many learning problems in imprecise probabilistic machine learning (IPML) may be viewed as processes involving successive applications of update rules on credal sets. This naturally raises the question of whether this iterative process converges to stable fixed points -- or, more generally, under what conditions on the updating mechanism such fixed points exist, and whether they can be attained. We provide the first analysis of this problem and illustrate our findings using Credal Bayesian Deep Learning as a concrete example. Our work demonstrates that incorporating imprecision into the learning process not only enriches the representation of uncertainty, but also reveals structural conditions under which stability emerges, thereby offering new insights into the dynamics of iterative learning under imprecision.

AIFeb 6, 2025
Explanation Design in Strategic Learning: Sufficient Explanations that Induce Non-harmful Responses

Kiet Q. H. Vo, Siu Lun Chau, Masahiro Kato et al. · oxford

We study explanation design in algorithmic decision making with strategic agents, individuals who may modify their inputs in response to explanations of a decision maker's (DM's) predictive model. As the demand for transparent algorithmic systems continues to grow, most prior work assumes full model disclosure as the default solution. In practice, however, DMs such as financial institutions typically disclose only partial model information via explanations. Such partial disclosure can lead agents to misinterpret the model and take actions that unknowingly harm their utility. A key open question is how DMs can communicate explanations in a way that avoids harming strategic agents, while still supporting their own decision-making goals, e.g., minimising predictive error. In this work, we analyse well-known explanation methods, and establish a necessary condition to prevent explanations from misleading agents into self-harming actions. Moreover, with a conditional homogeneity assumption, we prove that action recommendation-based explanations (ARexes) are sufficient for non-harmful responses, mirroring the revelation principle in information design. To demonstrate how ARexes can be operationalised in practice, we propose a simple learning procedure that jointly optimises the predictive model and explanation policy. Experiments on synthetic and real-world tasks show that ARexes allow the DM to optimise their model's predictive performance while preserving agents' utility, offering a more refined strategy for safe and effective partial model disclosure.

MLMay 24, 2023
Explaining the Uncertain: Stochastic Shapley Values for Gaussian Process Models

Siu Lun Chau, Krikamol Muandet, Dino Sejdinovic

We present a novel approach for explaining Gaussian processes (GPs) that can utilize the full analytical covariance structure present in GPs. Our method is based on the popular solution concept of Shapley values extended to stochastic cooperative games, resulting in explanations that are random variables. The GP explanations generated using our approach satisfy similar favorable axioms to standard Shapley values and possess a tractable covariance function across features and data observations. This covariance allows for quantifying explanation uncertainties and studying the statistical dependencies between explanations. We further extend our framework to the problem of predictive explanation, and propose a Shapley prior over the explanation function to predict Shapley values for new data based on previously computed ones. Our extensive illustrations demonstrate the effectiveness of the proposed approach.

AIMay 19, 2023
A Measure-Theoretic Axiomatisation of Causality

Junhyung Park, Simon Buchholz, Bernhard Schölkopf et al.

Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of \textit{what happens when one intervenes on a system}, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axiomatisation of causality. To that end, we propose the notion of a \textit{causal space}, consisting of a probability space along with a collection of transition probability kernels, called \textit{causal kernels}, that encode the causal information of the space. Our proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks including, for example, cycles, latent variables and stochastic processes.

LGJun 7, 2021
Instrument Space Selection for Kernel Maximum Moment Restriction

Rui Zhang, Krikamol Muandet, Bernhard Schölkopf et al.

Kernel maximum moment restriction (KMMR) recently emerges as a popular framework for instrumental variable (IV) based conditional moment restriction (CMR) models with important applications in conditional moment (CM) testing and parameter estimation for IV regression and proximal causal learning. The effectiveness of this framework, however, depends critically on the choice of a reproducing kernel Hilbert space (RKHS) chosen as a space of instruments. In this work, we presents a systematic way to select the instrument space for parameter estimation based on a principle of the least identifiable instrument space (LIIS) that identifies model parameters with the least space complexity. Our selection criterion combines two distinct objectives to determine such an optimal space: (i) a test criterion to check identifiability; (ii) an information criterion based on the effective dimension of RKHSs as a complexity measure. We analyze the consistency of our method in determining the LIIS, and demonstrate its effectiveness for parameter estimation via simulations.

LGMay 10, 2021
Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction

Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin et al.

We address the problem of causal effect estimation in the presence of unobserved confounding, but where proxies for the latent confounder(s) are observed. We propose two kernel-based methods for nonlinear causal effect estimation in this setting: (a) a two-stage regression approach, and (b) a maximum moment restriction approach. We focus on the proximal causal learning setting, but our methods can be used to solve a wider class of inverse problems characterised by a Fredholm integral equation. In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting. We provide consistency guarantees for each algorithm, and we demonstrate these approaches achieve competitive results on synthetic data and data simulating a real-world task. In particular, our approach outperforms earlier methods that are not suited to leveraging proxy variables.

MLFeb 16, 2021
Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression

Junhyung Park, Uri Shalit, Bernhard Schölkopf et al.

We propose to analyse the conditional distributional treatment effect (CoDiTE), which, in contrast to the more common conditional average treatment effect (CATE), is designed to encode a treatment's distributional aspects beyond the mean. We first introduce a formal definition of the CoDiTE associated with a distance function between probability measures. Then we discuss the CoDiTE associated with the maximum mean discrepancy via kernel conditional mean embeddings, which, coupled with a hypothesis test, tells us whether there is any conditional distributional effect of the treatment. Finally, we investigate what kind of conditional distributional effect the treatment has, both in an exploratory manner via the conditional witness function, and in a quantitative manner via U-statistic regression, generalising the CATE to higher-order moments. Experiments on synthetic, semi-synthetic and real datasets demonstrate the merits of our approach.

LGFeb 10, 2021
A Witness Two-Sample Test

Jonas M. Kübler, Wittawat Jitkrittum, Bernhard Schölkopf et al.

The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a weighted sum of kernel evaluations on a set of basis points. Typically the kernel is optimized on a training set, and hypothesis testing is performed on a separate test set to avoid overfitting (i.e., control type-I error). That is, the test set is used to simultaneously estimate the expectations and define the basis points, while the training set only serves to select the kernel and is discarded. In this work, we propose to use the training data to also define the weights and the basis points for better data efficiency. We show that 1) the new test is consistent and has a well-controlled type-I error; 2) the optimal witness function is given by a precision-weighted mean in the reproducing kernel Hilbert space associated with the kernel; and 3) the test power of the proposed test is comparable or exceeds that of the MMD and other modern tests, as verified empirically on challenging synthetic and real problems (e.g., Higgs data).

MLOct 21, 2020
Regularised Least-Squares Regression with Infinite-Dimensional Output Space

Junhyunng Park, Krikamol Muandet

This short technical report presents some learning theory results on vector-valued reproducing kernel Hilbert space (RKHS) regression, where the input space is allowed to be non-compact and the output space is a (possibly infinite-dimensional) Hilbert space. Our approach is based on the integral operator technique using spectral theory for non-compact operators. We place a particular emphasis on obtaining results with as few assumptions as possible; as such we only use Chebyshev's inequality, and no effort is made to obtain the best rates or constants.

LGOct 15, 2020
Instrumental Variable Regression via Kernel Maximum Moment Loss

Rui Zhang, Masaaki Imaizumi, Bernhard Schölkopf et al.

We investigate a simple objective for nonlinear instrumental variable (IV) regression based on a kernelized conditional moment restriction (CMR) known as a maximum moment restriction (MMR). The MMR objective is formulated by maximizing the interaction between the residual and the instruments belonging to a unit ball in a reproducing kernel Hilbert space (RKHS). First, it allows us to simplify the IV regression as an empirical risk minimization problem, where the risk functional depends on the reproducing kernel on the instrument and can be estimated by a U-statistic or V-statistic. Second, based on this simplification, we are able to provide the consistency and asymptotic normality results in both parametric and nonparametric settings. Lastly, we provide easy-to-use IV regression algorithms with an efficient hyper-parameter selection procedure. We demonstrate the effectiveness of our algorithms using experiments on both synthetic and real-world data.

CVAug 10, 2020
Grasping Field: Learning Implicit Representations for Human Grasps

Korrawe Karunratanakul, Jinlong Yang, Yan Zhang et al.

Robotic grasping of house-hold objects has made remarkable progress in recent years. Yet, human grasps are still difficult to synthesize realistically. There are several key reasons: (1) the human hand has many degrees of freedom (more than robotic manipulators); (2) the synthesized hand should conform to the surface of the object; and (3) it should interact with the object in a semantically and physically plausible manner. To make progress in this direction, we draw inspiration from the recent progress on learning-based implicit representations for 3D object reconstruction. Specifically, we propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks. Our insight is that every point in a three-dimensional space can be characterized by the signed distances to the surface of the hand and the object, respectively. Consequently, the hand, the object, and the contact area can be represented by implicit surfaces in a common space, in which the proximity between the hand and the object can be modelled explicitly. We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data. We demonstrate that the proposed grasping field is an effective and expressive representation for human grasp generation. Specifically, our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud. The extensive experiments demonstrate that our generative model compares favorably with a strong baseline and approaches the level of natural human grasps. Our method improves the physical plausibility of the hand-object contact reconstruction and achieves comparable performance for 3D hand reconstruction compared to state-of-the-art methods.

LGJun 3, 2020
Learning Kernel Tests Without Data Splitting

Jonas M. Kübler, Wittawat Jitkrittum, Bernhard Schölkopf et al.

Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.

STFeb 21, 2020
Kernel Conditional Moment Test via Maximum Moment Restriction

Krikamol Muandet, Wittawat Jitkrittum, Jonas Kübler

We propose a new family of specification tests called kernel conditional moment (KCM) tests. Our tests are built on a novel representation of conditional moment restrictions in a reproducing kernel Hilbert space (RKHS) called conditional moment embedding (CMME). After transforming the conditional moment restrictions into a continuum of unconditional counterparts, the test statistic is defined as the maximum moment restriction (MMR) within the unit ball of the RKHS. We show that the MMR not only fully characterizes the original conditional moment restrictions, leading to consistency in both hypothesis testing and parameter estimation, but also has an analytic expression that is easy to compute as well as closed-form asymptotic distributions. Our empirical studies show that the KCM test has a promising finite-sample performance compared to existing tests.

LGFeb 10, 2020
A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings

Junhyung Park, Krikamol Muandet

We present an operator-free, measure-theoretic approach to the conditional mean embedding (CME) as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of unconditional distributions has been defined rigorously, the existing operator-based approach of the conditional version depends on stringent assumptions that hinder its analysis. We overcome this limitation via a measure-theoretic treatment of CMEs. We derive a natural regression interpretation to obtain empirical estimates, and provide a thorough theoretical analysis thereof, including universal consistency. As natural by-products, we obtain the conditional analogues of the maximum mean discrepancy and Hilbert-Schmidt independence criterion, and demonstrate their behaviour via simulations.

MLNov 25, 2019
A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming

Jia-Jie Zhu, Krikamol Muandet, Moritz Diehl et al.

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide applicability as demonstrated in distinct numerical examples. The implication of this new concept is a new mode of thinking about the statistical nature of uncertainty in dynamical systems.

MLOct 29, 2019
Kernel-Guided Training of Implicit Generative Models with Stability Guarantees

Arash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet et al.

Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this work, we propose a theoretically grounded method to guide the training trajectories of GANs by augmenting the GAN loss function with a kernel-based regularization term that controls local and global discrepancies between the model and true distributions. This control signal allows us to inject prior knowledge into the model. We provide theoretical guarantees on the stability of the resulting dynamical system and demonstrate different aspects of it via a wide range of experiments.

MLOct 27, 2019
Dual Instrumental Variable Regression

Krikamol Muandet, Arash Mehrjou, Si Kai Lee et al.

We present a novel algorithm for non-linear instrumental variable (IV) regression, DualIV, which simplifies traditional two-stage methods via a dual formulation. Inspired by problems in stochastic programming, we show that two-stage procedures for non-linear IV regression can be reformulated as a convex-concave saddle-point problem. Our formulation enables us to circumvent the first-stage regression which is a potential bottleneck in real-world applications. We develop a simple kernel-based algorithm with an analytic solution based on this formulation. Empirical results show that we are competitive to existing, more complicated algorithms for non-linear instrumental variable regression.

LGJun 3, 2019
Frontal Low-rank Random Tensors for Fine-grained Action Segmentation

Yan Zhang, Krikamol Muandet, Qianli Ma et al.

Fine-grained action segmentation in long untrimmed videos is an important task for many applications such as surveillance, robotics, and human-computer interaction. To understand subtle and precise actions within a long time period, second-order information (e.g. feature covariance) or higher is reported to be effective in the literature. However, extracting such high-order information is considerably non-trivial. In particular, the dimensionality increases exponentially with the information order, and hence gaining more representation power also increases the computational cost and the risk of overfitting. In this paper, we propose an approach to representing high-order information for temporal action segmentation via a simple yet effective bilinear form. Specifically, our contributions are: (1) From the multilinear perspective, we derive a bilinear form of low complexity, assuming that the three-way tensor has low-rank frontal slices. (2) Rather than learning the tensor entries from data, we sample the entries from different underlying distributions, and prove that the underlying distribution influences the information order. (3) We employed our bilinear form as an intermediate layer in state-of-the-art deep neural networks, enabling to represent high-order information in complex deep models effectively and efficiently. Our experimental results demonstrate that the proposed bilinear form outperforms the previous state-of-the-art methods on the challenging temporal action segmentation task. One can see our project page for data, model and code: \url{https://vlg.inf.ethz.ch/projects/BilinearTCN/}.

QUANT-PHMay 31, 2019
Quantum Mean Embedding of Probability Distributions

Jonas M. Kübler, Krikamol Muandet, Bernhard Schölkopf

The kernel mean embedding of probability distributions is commonly used in machine learning as an injective mapping from distributions to functions in an infinite dimensional Hilbert space. It allows us, for example, to define a distance measure between probability distributions, called maximum mean discrepancy (MMD). In this work, we propose to represent probability distributions in a pure quantum state of a system that is described by an infinite dimensional Hilbert space. This enables us to work with an explicit representation of the mean embedding, whereas classically one can only work implicitly with an infinite dimensional Hilbert space through the use of the kernel trick. We show how this explicit representation can speed up methods that rely on inner products of mean embeddings and discuss the theoretical and experimental challenges that need to be solved in order to achieve these speedups.

LGMay 29, 2019
Privacy-Preserving Causal Inference via Inverse Probability Weighting

Si Kai Lee, Luigi Gresele, Mijung Park et al.

The use of inverse probability weighting (IPW) methods to estimate the causal effect of treatments from observational studies is widespread in econometrics, medicine and social sciences. Although these studies often involve sensitive information, thus far there has been no work on privacy-preserving IPW methods. We address this by providing a novel framework for privacy-preserving IPW (PP-IPW) methods. We include a theoretical analysis of the effects of our proposed privatisation procedure on the estimated average treatment effect, and evaluate our PP-IPW framework on synthetic, semi-synthetic and real datasets. The empirical results are consistent with our theoretical findings.

LGMay 27, 2019
Kernel Conditional Density Operators

Ingmar Schuster, Mattes Mollenhauer, Stefan Klus et al.

We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample bounds for the estimation error in a standard density reconstruction scenario, independent of problem dimensionality. Interestingly, when a kernel is used that is also a probability density, the CDO allows us to both evaluate and sample the output density efficiently. We demonstrate the versatility and performance of the proposed model on both synthetic and real-world data.

LGFeb 8, 2019
Fair Decisions Despite Imperfect Predictions

Niki Kilbertus, Manuel Gomez-Rodriguez, Bernhard Schölkopf et al.

Consequential decisions are increasingly informed by sophisticated data-driven predictive models. However, to consistently learn accurate predictive models, one needs access to ground truth labels. Unfortunately, in practice, labels may only exist conditional on certain decisions---if a loan is denied, there is not even an option for the individual to pay back the loan. Hence, the observed data distribution depends on how decisions are being made. In this paper, we show that in this selective labels setting, learning a predictor directly only from available labeled data is suboptimal in terms of both fairness and utility. To avoid this undesirable behavior, we propose to directly learn decision policies that maximize utility under fairness constraints and thereby take into account how decisions affect which data is observed in the future. Our results suggest the need for a paradigm shift in the context of fair machine learning from the currently prevalent idea of simply building predictive models from a single static dataset via risk minimization, to a more interactive notion of "learning to decide". In particular, such policies should not entirely neglect part of the input space, drawing connections to explore/exploit tradeoffs in reinforcement learning, data missingness, and potential outcomes in causal inference. Experiments on synthetic and real-world data illustrate the favorable properties of learning to decide in terms of utility and fairness.

LGJan 26, 2019
Kernel-Guided Training of Implicit Generative Models with Stability Guarantees

Arash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet et al.

Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this work, we propose a theoretically grounded method to guide the training trajectories of GANs by augmenting the GAN loss function with a kernel-based regularization term that controls local and global discrepancies between the model and true distributions. This control signal allows us to inject prior knowledge into the model. We provide theoretical guarantees on the stability of the resulting dynamical system and demonstrate different aspects of it via a wide range of experiments.

CVDec 5, 2018
Local Temporal Bilinear Pooling for Fine-grained Action Parsing

Yan Zhang, Siyu Tang, Krikamol Muandet et al.

Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations in a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. In contrast to other work, our proposed bilinear pooling is learnable and hence can capture more complex local statistics than the conventional counterpart. In addition, we introduce exact lower-dimension representations of our bilinear forms, so that the dimensionality is reduced with neither information loss nor extra computation. We perform intensive experiments to quantitatively analyze our model and show the superior performances to other state-of-the-art work on various datasets.