LGOct 31, 2022
CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation DataMathieu Chevalley, Yusuf Roohani, Arash Mehrjou et al.
Causal inference is a vital aspect of multiple scientific disciplines and is routinely applied to high-impact applications such as medicine. However, evaluating the performance of causal inference methods in real-world environments is challenging due to the need for observations under both interventional and control conditions. Traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. To address this, we introduce CausalBench, a benchmark suite for evaluating network inference methods on real-world interventional data from large-scale single-cell perturbation experiments. CausalBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics. A systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of current methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. Thus, CausalBench opens new avenues in causal network inference research and provides a principled and reliable way to track progress in leveraging real-world interventional data.
CVJun 15, 2022
Diffusion Models for Video Prediction and InfillingTobias Höppe, Arash Mehrjou, Stefan Bauer et al.
Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate RaMViD on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation. High-resolution videos are provided at https://sites.google.com/view/video-diffusion-prediction.
LGAug 29, 2023
The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation dataMathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani et al.
In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. Such maps are not only foundational for understanding the molecular mechanisms underlying disease biology but also pivotal for formulating hypotheses about potential targets for new medicines. Recognizing the need to elevate the construction of these gene-gene interaction networks, especially from large-scale, real-world datasets of perturbed single cells, the CausalBench Challenge was initiated. This challenge aimed to inspire the machine learning community to enhance state-of-the-art methods, emphasizing better utilization of expansive genetic perturbation data. Using the framework provided by the CausalBench benchmark, participants were tasked with refining the current methodologies or proposing new ones. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. Notably, the winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine.
LGNov 7, 2022
Federated Causal Discovery From InterventionsAmin Abyaneh, Nino Scherrer, Patrick Schwab et al.
Causal discovery serves a pivotal role in mitigating model uncertainty through recovering the underlying causal mechanisms among variables. In many practical domains, such as healthcare, access to the data gathered by individual entities is limited, primarily for privacy and regulatory constraints. However, the majority of existing causal discovery methods require the data to be available in a centralized location. In response, researchers have introduced federated causal discovery. While previous federated methods consider distributed observational data, the integration of interventional data remains largely unexplored. We propose FedCDI, a federated framework for inferring causal structures from distributed data containing interventional samples. In line with the federated learning framework, FedCDI improves privacy by exchanging belief updates rather than raw samples. Additionally, it introduces a novel intervention-aware method for aggregating individual updates. We analyze scenarios with shared or disjoint intervened covariates, and mitigate the adverse effects of interventional data heterogeneity. The performance and scalability of FedCDI is rigorously tested across a variety of synthetic and real-world graphs.
QMJun 15, 2023
Multi-omics Prediction from High-content Cellular Imaging with Deep LearningRahil Mehrizi, Arash Mehrjou, Maryana Alegro et al.
High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially enable the prediction of multi-omics directly from cell imaging data is therefore currently unclear. Here, we address the question of whether it is possible to predict bulk multi-omics measurements directly from cell images using Image2Omics - a deep learning approach that predicts multi-omics in a cell population directly from high-content images of cells stained with multiplexed fluorescent dyes. We perform an experimental evaluation in gene-edited macrophages derived from human induced pluripotent stem cells (hiPSC) under multiple stimulation conditions and demonstrate that Image2Omics achieves significantly better performance in predicting transcriptomics and proteomics measurements directly from cell images than predictions based on the mean observed training set abundance. We observed significant predictability of abundances for 4927 (18.72%; 95% CI: 6.52%, 35.52%) and 3521 (13.38%; 95% CI: 4.10%, 32.21%) transcripts out of 26137 in M1 and M2-stimulated macrophages respectively and for 422 (8.46%; 95% CI: 0.58%, 25.83%) and 697 (13.98%; 95% CI: 2.41%, 32.83%) proteins out of 4986 in M1 and M2-stimulated macrophages respectively. Our results show that some transcript and protein abundances are predictable from cell imaging and that cell imaging may potentially, in some settings and depending on the mechanisms of interest and desired performance threshold, even be a scalable and resource-efficient substitute for multi-omics measurements.
LGApr 20, 2022
Federated Learning in Multi-Center Critical Care Research: A Systematic Case Study using the eICU DatabaseArash Mehrjou, Ashkan Soleymani, Annika Buchholz et al.
Federated learning (FL) has been proposed as a method to train a model on different units without exchanging data. This offers great opportunities in the healthcare sector, where large datasets are available but cannot be shared to ensure patient privacy. We systematically investigate the effectiveness of FL on the publicly available eICU dataset for predicting the survival of each ICU stay. We employ Federated Averaging as the main practical algorithm for FL and show how its performance changes by altering three key hyper-parameters, taking into account that clients can significantly vary in size. We find that in many settings, a large number of local training epochs improves the performance while at the same time reducing communication costs. Furthermore, we outline in which settings it is possible to have only a low number of hospitals participating in each federated update round. When many hospitals with low patient counts are involved, the effect of overfitting can be avoided by decreasing the batchsize. This study thus contributes toward identifying suitable settings for running distributed algorithms such as FL on clinical datasets.
LGOct 25, 2022
From Points to Functions: Infinite-dimensional Representations in Diffusion ModelsSarthak Mittal, Guillaume Lajoie, Stefan Bauer et al.
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks.
LGNov 4, 2025
Theoretical Guarantees for Causal Discovery on Large Random GraphsMathieu Chevalley, Arash Mehrjou, Patrick Schwab
We investigate theoretical guarantees for the false-negative rate (FNR) -- the fraction of true causal edges whose orientation is not recovered, under single-variable random interventions and an $ε$-interventional faithfulness assumption that accommodates latent confounding. For sparse Erdős--Rényi directed acyclic graphs, where the edge probability scales as $p_e = Θ(1/d)$, we show that the FNR concentrates around its mean at rate $O(\frac{\log d}{\sqrt d})$, implying that large deviations above the expected error become exponentially unlikely as dimensionality increases. This concentration ensures that derived upper bounds hold with high probability in large-scale settings. Extending the analysis to generalized Barabási--Albert graphs reveals an even stronger phenomenon: when the degree exponent satisfies $γ> 3$, the deviation width scales as $O(d^{β- \frac{1}{2}})$ with $β= 1/(γ- 1) < \frac{1}{2}$, and hence vanishes in the limit. This demonstrates that realistic scale-free topologies intrinsically regularize causal discovery, reducing variability in orientation error. These finite-dimension results provide the first dimension-adaptive, faithfulness-robust guarantees for causal structure recovery, and challenge the intuition that high dimensionality and network heterogeneity necessarily hinder accurate discovery. Our simulation results corroborate these theoretical predictions, showing that the FNR indeed concentrates and often vanishes in practice as dimensionality grows.
LGOct 22, 2021Code
GeneDisco: A Benchmark for Experimental Design in Drug DiscoveryArash Mehrjou, Ashkan Soleymani, Andrew Jesson et al.
In vitro cellular experimentation with genetic interventions, using for example CRISPR technologies, is an essential step in early-stage drug discovery and target validation that serves to assess initial hypotheses about causal associations between biological mechanisms and disease pathologies. With billions of potential hypotheses to test, the experimental design space for in vitro genetic experiments is extremely vast, and the available experimental capacity - even at the largest research institutions in the world - pales in relation to the size of this biological hypothesis space. Machine learning methods, such as active and reinforcement learning, could aid in optimally exploring the vast biological space by integrating prior knowledge from various information sources as well as extrapolating to yet unexplored areas of the experimental design space based on available data. However, there exist no standardised benchmarks and data sets for this challenging task and little research has been conducted in this area to date. Here, we introduce GeneDisco, a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.
QMDec 7, 2023
DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment DesignClare Lyle, Arash Mehrjou, Pascal Notin et al. · deepmind
The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interventions that maximally change a target phenotype via diverse mechanisms. We propose DiscoBAX, a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. DiscoBAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems.
LGMar 30, 2025
In-silico biological discovery with large perturbation modelsDjordje Miladinovic, Tobias Höppe, Mathieu Chevalley et al.
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.
GNJan 13, 2025
Multi-megabase scale genome interpretation with genetic language modelsFrederik Träuble, Lachlan Stuart, Andreas Georgiou et al.
Understanding how molecular changes caused by genetic variation drive disease risk is crucial for deciphering disease mechanisms. However, interpreting genome sequences is challenging because of the vast size of the human genome, and because its consequences manifest across a wide range of cells, tissues and scales -- spanning from molecular to whole organism level. Here, we present Phenformer, a multi-scale genetic language model that learns to generate mechanistic hypotheses as to how differences in genome sequence lead to disease-relevant changes in expression across cell types and tissues directly from DNA sequences of up to 88 million base pairs. Using whole genome sequencing data from more than 150 000 individuals, we show that Phenformer generates mechanistic hypotheses about disease-relevant cell and tissue types that match literature better than existing state-of-the-art methods, while using only sequence data. Furthermore, disease risk predictors enriched by Phenformer show improved prediction performance and generalisation to diverse populations. Accurate multi-megabase scale interpretation of whole genomes without additional experimental data enables both a deeper understanding of molecular mechanisms involved in disease and improved disease risk prediction at the level of individuals.
LGOct 11, 2024
Efficient Differentiable Discovery of Causal OrderMathieu Chevalley, Arash Mehrjou, Patrick Schwab
In the algorithm Intersort, Chevalley et al. (2024) proposed a score-based method to discover the causal order of variables in a Directed Acyclic Graph (DAG) model, leveraging interventional data to outperform existing methods. However, as a score-based method over the permutahedron, Intersort is computationally expensive and non-differentiable, limiting its ability to be utilised in problems involving large-scale datasets, such as those in genomics and climate models, or to be integrated into end-to-end gradient-based learning frameworks. We address this limitation by reformulating Intersort using differentiable sorting and ranking techniques. Our approach enables scalable and differentiable optimization of causal orderings, allowing the continuous score function to be incorporated as a regularizer in downstream tasks. Empirical results demonstrate that causal discovery algorithms benefit significantly from regularizing on the causal order, underscoring the effectiveness of our method. Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models and thereby addresses a long-standing limitation of purely associational supervised learning.
ROJan 15, 2022
Physical Derivatives: Computing policy gradients by physical forward-propagationArash Mehrjou, Ashkan Soleymani, Stefan Bauer et al.
Model-free and model-based reinforcement learning are two ends of a spectrum. Learning a good policy without a dynamic model can be prohibitively expensive. Learning the dynamic model of a system can reduce the cost of learning the policy, but it can also introduce bias if it is not accurate. We propose a middle ground where instead of the transition model, the sensitivity of the trajectories with respect to the perturbation of the parameters is learned. This allows us to predict the local behavior of the physical system around a set of nominal policies without knowing the actual model. We assay our method on a custom-built physical robot in extensive experiments and show the feasibility of the approach in practice. We investigate potential challenges when applying our method to physical systems and propose solutions to each of them.
LGOct 29, 2021
GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RLSumedh A Sontakke, Stephen Iota, Zizhao Hu et al.
Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data, and the data are a function of the policy followed by the agent. An agent could thus neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent that actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that GalilAI outperforms the baseline significantly. See visualizations of our method https://galil-ai.github.io/
MLJul 8, 2021
Federated Learning as a Mean-Field GameArash Mehrjou
We establish a connection between federated learning, a concept from machine learning, and mean-field games, a concept from game theory and control theory. In this analogy, the local federated learners are considered as the players and the aggregation of the gradients in a central server is the mean-field effect. We present federated learning as a differential game and discuss the properties of the equilibrium of this game. We hope this novel view to federated learning brings together researchers from these two distinct areas to work on fundamental problems of large-scale distributed and privacy-preserving learning algorithms.
LGMay 29, 2021
Diffusion-Based Representation LearningSarthak Mittal, Korbinian Abstreiter, Stefan Bauer et al.
Diffusion-based methods represented as stochastic differential equations on a continuous-time domain have recently proven successful as a non-adversarial generative model. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective and thus encodes the information needed for denoising. We illustrate how this difference allows for manual control of the level of details encoded in the representation. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification. We also compare the quality of learned representations of diffusion score matching with other methods like autoencoder and contrastively trained systems through their performances on downstream tasks.
PEMar 24, 2021
Pyfectious: An individual-level simulator to discover optimal containment polices for epidemic diseasesArash Mehrjou, Ashkan Soleymani, Amin Abyaneh et al.
Simulating the spread of infectious diseases in human communities is critical for predicting the trajectory of an epidemic and verifying various policies to control the devastating impacts of the outbreak. Many existing simulators are based on compartment models that divide people into a few subsets and simulate the dynamics among those subsets using hypothesized differential equations. However, these models lack the requisite granularity to study the effect of intelligent policies that influence every individual in a particular way. In this work, we introduce a simulator software capable of modeling a population structure and controlling the disease's propagation at an individualistic level. In order to estimate the confidence of the conclusions drawn from the simulator, we employ a comprehensive probabilistic approach where the entire population is constructed as a hierarchical random variable. This approach makes the inferred conclusions more robust against sampling artifacts and gives confidence bounds for decisions based on the simulation results. To showcase potential applications, the simulator parameters are set based on the formal statistics of the COVID-19 pandemic, and the outcome of a wide range of control measures is investigated. Furthermore, the simulator is used as the environment of a reinforcement learning problem to find the optimal policies to control the pandemic. The obtained experimental results indicate the simulator's adaptability and capacity in making sound predictions and a successful policy derivation example based on real-world data. As an exemplary application, our results show that the proposed policy discovery method can lead to control measures that produce significantly fewer infected individuals in the population and protect the health system against saturation.
LGOct 7, 2020
Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation LearningSumedh A. Sontakke, Arash Mehrjou, Laurent Itti et al.
Animals exhibit an innate ability to learn regularities of the world through interaction. By performing experiments in their environment, they are able to discern the causal factors of variation and infer how they affect the world's dynamics. Inspired by this, we attempt to equip reinforcement learning agents with the ability to perform experiments that facilitate a categorization of the rolled-out trajectories, and to subsequently infer the causal factors of the environment in a hierarchical manner. We introduce {\em causal curiosity}, a novel intrinsic reward, and show that it allows our agents to learn optimal sequences of actions and discover causal factors in the dynamics of the environment. The learned behavior allows the agents to infer a binary quantized representation for the ground-truth causal factors in every environment. Additionally, we find that these experimental behaviors are semantically meaningful (e.g., our agents learn to lift blocks to categorize them by weight), and are learnt in a self-supervised manner with approximately 2.5 times less data than conventional supervised planners. We show that these behaviors can be re-purposed and fine-tuned (e.g., from lifting to pushing or other downstream tasks). Finally, we show that the knowledge of causal factor representations aids zero-shot learning for more complex tasks. Visit https://sites.google.com/usc.edu/causal-curiosity/home for website.
APAug 31, 2020
Real-time Prediction of COVID-19 related Mortality using Electronic Health RecordsPatrick Schwab, Arash Mehrjou, Sonali Parbhoo et al.
Coronavirus Disease 2019 (COVID-19) is an emerging respiratory disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with rapid human-to-human transmission and a high case fatality rate particularly in older patients. Due to the exponential growth of infections, many healthcare systems across the world are under pressure to care for increasing amounts of at-risk patients. Given the high number of infected patients, identifying patients with the highest mortality risk early is critical to enable effective intervention and optimal prioritisation of care. Here, we present the COVID-19 Early Warning System (CovEWS), a clinical risk scoring system for assessing COVID-19 related mortality risk. CovEWS provides continuous real-time risk scores for individual patients with clinically meaningful predictive performance up to 192 hours (8 days) in advance, and is automatically derived from patients' electronic health records (EHRs) using machine learning. We trained and evaluated CovEWS using de-identified data from a cohort of 66430 COVID-19 positive patients seen at over 69 healthcare institutions in the United States (US), Australia, Malaysia and India amounting to an aggregated total of over 2863 years of patient observation time. On an external test cohort of 5005 patients, CovEWS predicts COVID-19 related mortality from $78.8\%$ ($95\%$ confidence interval [CI]: $76.0$, $84.7\%$) to $69.4\%$ ($95\%$ CI: $57.6, 75.2\%$) specificity at a sensitivity greater than $95\%$ between respectively 1 and 192 hours prior to observed mortality events - significantly outperforming existing generic and COVID-19 specific clinical risk scores. CovEWS could enable clinicians to intervene at an earlier stage, and may therefore help in preventing or mitigating COVID-19 related mortality.
MLAug 23, 2020
Learning Dynamical Systems using Local Stability PriorsArash Mehrjou, Andrea Iannelli, Bernhard Schölkopf
A coupled computational approach to simultaneously learn a vector field and the region of attraction of an equilibrium point from generated trajectories of the system is proposed. The nonlinear identification leverages the local stability information as a prior on the system, effectively endowing the estimate with this important structural property. In addition, the knowledge of the region of attraction plays an experiment design role by informing the selection of initial conditions from which trajectories are generated and by enabling the use of a Lyapunov function of the system as a regularization term. Numerical results show that the proposed method allows efficient sampling and provides an accurate estimate of the dynamics in an inner approximation of its region of attraction.
SYJun 6, 2020
Neural Lyapunov RedesignArash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf
Learning controllers merely based on a performance metric has been proven effective in many physical and non-physical tasks in both control theory and reinforcement learning. However, in practice, the controller must guarantee some notion of safety to ensure that it does not harm either the agent or the environment. Stability is a crucial notion of safety, whose violation can certainly cause unsafe behaviors. Lyapunov functions are effective tools to assess stability in nonlinear dynamical systems. In this paper, we combine an improving Lyapunov function with automatic controller synthesis in an iterative fashion to obtain control policies with large safe regions. We propose a two-player collaborative algorithm that alternates between estimating a Lyapunov function and deriving a controller that gradually enlarges the stability region of the closed-loop system. We provide theoretical results on the class of systems that can be treated with the proposed algorithm and empirically evaluate the effectiveness of our method using an exemplary dynamical system.
MLOct 29, 2019
Kernel-Guided Training of Implicit Generative Models with Stability GuaranteesArash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet et al.
Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this work, we propose a theoretically grounded method to guide the training trajectories of GANs by augmenting the GAN loss function with a kernel-based regularization term that controls local and global discrepancies between the model and true distributions. This control signal allows us to inject prior knowledge into the model. We provide theoretical guarantees on the stability of the resulting dynamical system and demonstrate different aspects of it via a wide range of experiments.
MLOct 27, 2019
Dual Instrumental Variable RegressionKrikamol Muandet, Arash Mehrjou, Si Kai Lee et al.
We present a novel algorithm for non-linear instrumental variable (IV) regression, DualIV, which simplifies traditional two-stage methods via a dual formulation. Inspired by problems in stochastic programming, we show that two-stage procedures for non-linear IV regression can be reformulated as a convex-concave saddle-point problem. Our formulation enables us to circumvent the first-stage regression which is a potential bottleneck in real-world applications. We develop a simple kernel-based algorithm with an analytic solution based on this formulation. Empirical results show that we are competitive to existing, more complicated algorithms for non-linear instrumental variable regression.
MLMay 16, 2019
The Incomplete Rosetta Stone Problem: Identifiability Results for Multi-View Nonlinear ICALuigi Gresele, Paul K. Rubenstein, Arash Mehrjou et al.
We consider the problem of recovering a common latent source with independent components from multiple views. This applies to settings in which a variable is measured with multiple experimental modalities, and where the goal is to synthesize the disparate measurements into a single unified representation. We consider the case that the observed views are a nonlinear mixing of component-wise corruptions of the sources. When the views are considered separately, this reduces to nonlinear Independent Component Analysis (ICA) for which it is provably impossible to undo the mixing. We present novel identifiability proofs that this is possible when the multiple views are considered jointly, showing that the mixing can theoretically be undone using function approximators such as deep neural networks. In contrast to known identifiability results for nonlinear ICA, we prove that independent latent sources with arbitrary mixing can be recovered as long as multiple, sufficiently different noisy views are available.
LGJan 26, 2019
Kernel-Guided Training of Implicit Generative Models with Stability GuaranteesArash Mehrjou, Wittawat Jitkrittum, Krikamol Muandet et al.
Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meaningful way during the course of training. In this work, we propose a theoretically grounded method to guide the training trajectories of GANs by augmenting the GAN loss function with a kernel-based regularization term that controls local and global discrepancies between the model and true distributions. This control signal allows us to inject prior knowledge into the model. We provide theoretical guarantees on the stability of the resulting dynamical system and demonstrate different aspects of it via a wide range of experiments.
LGDec 8, 2018
Counterfactuals uncover the modular structure of deep generative modelsMichel Besserve, Arash Mehrjou, Rémy Sun et al.
Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to disentangle latent factors, we argue that such requirement is too restrictive and propose instead a non-statistical framework that relies on counterfactual manipulations to uncover a modular structure of the network composed of disentangled groups of internal variables. Experiments with a variety of generative models trained on complex image datasets show the obtained modules can be used to design targeted interventions. This opens the way to applications such as computationally efficient style transfer and the automated assessment of robustness to contextual changes in pattern recognition systems.
SPNov 14, 2018
Deep Nonlinear Non-Gaussian Filtering for Dynamical SystemsArash Mehrjou, Bernhard Schölkopf
Filtering is a general name for inferring the states of a dynamical system given observations. The most common filtering approach is Gaussian Filtering (GF) where the distribution of the inferred states is a Gaussian whose mean is an affine function of the observations. There are two restrictions in this model: Gaussianity and Affinity. We propose a model to relax both these assumptions based on recent advances in implicit generative models. Empirical results show that the proposed method gives a significant advantage over GF and nonlinear methods based on fixed nonlinear kernels.
SYSep 27, 2018
Efficient Encoding of Dynamical Systems through Local ApproximationsFriedrich Solowjow, Arash Mehrjou, Bernhard Schölkopf et al.
An efficient representation of observed data has many benefits in various domains of engineering and science. Representing static data sets, such as images, is a living branch in machine learning and eases downstream tasks, such as classification, regression, or decision making. However, the representation of dynamical systems has received less attention. In this work, we develop a method to represent a dynamical system efficiently as a combination of a state and a local model, which fulfills a criterion inspired by the minimum description length (MDL) principle. The MDL principle is used in machine learning and statistics to quantify the trade-off between the ability to explain seen data and the model complexity. Networked control systems are a prominent example, where such a representation is beneficial. When many agents share a network, information exchange is costly and should thus happen only when necessary. We empirically show the efficiency of the proposed encoding for several dynamical systems and demonstrate reduced communication for event-triggered state estimation problems.
MLMay 27, 2018
A Local Information Criterion for Dynamical SystemsArash Mehrjou, Friedrich Solowjow, Sebastian Trimpe et al.
Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations that can be leveraged to achieve a more efficient code. We propose a method to encode a given or learned dynamical system. Apart from its application for encoding a sequence of observations, we propose to use the compression achieved by this encoding as a criterion for model selection. Given a dataset, different learning algorithms result in different models. But not all learned models are equally good. We show that the proposed encoding approach can be used to choose the learned model which is closer to the true underlying dynamics. We provide experiments for both encoding and model selection, and theoretical results that shed light on why the approach works.
MLMay 23, 2018
Distribution Aware Active LearningArash Mehrjou, Mehran Khodabandeh, Greg Mori
Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We provide the formulation and empirically show our idea via toy and real examples.
MLMay 21, 2018
Deep Energy Estimator NetworksSaeed Saremi, Arash Mehrjou, Bernhard Schölkopf et al.
Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, and construct a multilayer perceptron with a scalable objective for learning the energy (i.e. the unnormalized log-density), which is then optimized with stochastic gradient descent. In addition, the resulting deep energy estimator network (DEEN) is designed as products of experts. We present the utility of DEEN in learning the energy, the score function, and in single-step denoising experiments for synthetic and high-dimensional data. We also diagnose stability problems in the direct estimation of the score function that had been observed for denoising autoencoders.
MLMar 13, 2018
Analysis of Nonautonomous Adversarial SystemsArash Mehrjou
Generative adversarial networks are used to generate images but still their convergence properties are not well understood. There have been a few studies who intended to investigate the stability properties of GANs as a dynamical system. This short writing can be seen in that direction. Among the proposed methods for stabilizing training of GANs, ß-GAN was the first who proposed a complete annealing strategy to change high-level conditions of the GAN objective. In this note, we show by a simple example how annealing strategy works in GANs. The theoretical analysis is supported by simple simulations.
MLFeb 12, 2018
Tempered Adversarial NetworksMehdi S. M. Sajjadi, Giambattista Parascandolo, Arash Mehrjou et al.
Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces since the real data distribution is fixed by the choice of a given dataset. We propose a simple modification that gives the generator control over the real samples which leads to a tempered learning process for both generator and discriminator. The real data distribution passes through a lens before being revealed to the discriminator, balancing the generator and discriminator by gradually revealing more detailed features necessary to produce high-quality results. The proposed module automatically adjusts the learning process to the current strength of the networks, yet is generic and easy to add to any GAN variant. In a number of experiments, we show that this can improve quality, stability and/or convergence speed across a range of different GAN architectures (DCGAN, LSGAN, WGAN-GP).
LGNov 8, 2017
Fidelity-Weighted LearningMostafa Dehghani, Arash Mehrjou, Stephan Gouws et al.
Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.
MLAug 17, 2017
Towards life cycle identification of malaria parasites using machine learning and Riemannian geometryArash Mehrjou
Malaria is a serious infectious disease that is responsible for over half million deaths yearly worldwide. The major cause of these mortalities is late or inaccurate diagnosis. Manual microscopy is currently considered as the dominant diagnostic method for malaria. However, it is time consuming and prone to human errors. The aim of this paper is to automate the diagnosis process and minimize the human intervention. We have developed the hardware and software for a cost-efficient malaria diagnostic system. This paper describes the manufactured hardware and also proposes novel software to handle parasite detection and life-stage identification. A motorized microscope is developed to take images from Giemsa-stained blood smears. A patch-based unsupervised statistical clustering algorithm is proposed which offers a novel method for classification of different regions within blood images. The proposed method provides better robustness against different imaging settings. The core of the proposed algorithm is a model called Mixture of Independent Component Analysis. A manifold based optimization method is proposed that facilitates the application of the model for high dimensional data usually acquired in medical microscopy. The method was tested on 600 blood slides with various imaging conditions. The speed of the method is higher than current supervised systems while its accuracy is comparable to or better than them.
MLMay 21, 2017
Annealed Generative Adversarial NetworksArash Mehrjou, Bernhard Schölkopf, Saeed Saremi
We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed ß-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, ß-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.