LGJul 6, 2022
TREE-G: Decision Trees Contesting Graph Neural NetworksMaya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach
When dealing with tabular data, models based on decision trees are a popular choice due to their high accuracy on these data types, their ease of application, and explainability properties. However, when it comes to graph-structured data, it is not clear how to apply them effectively, in a way that incorporates the topological information with the tabular data available on the vertices of the graph. To address this challenge, we introduce TREE-G. TREE-G modifies standard decision trees, by introducing a novel split function that is specialized for graph data. Not only does this split function incorporate the node features and the topological information, but it also uses a novel pointer mechanism that allows split nodes to use information computed in previous splits. Therefore, the split function adapts to the predictive task and the graph at hand. We analyze the theoretical properties of TREE-G and demonstrate its benefits empirically on multiple graph and vertex prediction benchmarks. In these experiments, TREE-G consistently outperforms other tree-based models and often outperforms other graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels, sometimes by large margins. Moreover, TREE-Gs models and their predictions can be explained and visualized
LGSep 8, 2023
Graph Neural Networks Use Graphs When They Shouldn'tMaya Bechler-Speicher, Ido Amos, Ran Gilad-Bachrach et al.
Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure. Namely, they use it even when a better solution can be obtained by ignoring it. We analyze the implicit bias of gradient-descent learning of GNNs and prove that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this over-fitting. We also prove that within the family of regular graphs, GNNs are guaranteed to extrapolate when learning with gradient descent. Finally, based on our empirical and theoretical findings, we demonstrate on real-data how regular graphs can be leveraged to reduce graph overfitting and enhance performance.
HCJul 16, 2024
Improving Engagement and Efficacy of mHealth Micro-Interventions for Stress Coping: an In-The-Wild StudyChaya Ben Yehuda, Ran Gilad-Bachrach, Yarin Udi
Sustaining long-term user engagement with mobile health (mHealth) interventions while preserving their high efficacy remains an ongoing challenge in real-world well-being applications. To address this issue, we introduce a new algorithm, the Personalized, Context-Aware Recommender (PCAR), for intervention selection and evaluate its performance in a field experiment. In a four-week, in-the-wild experiment involving 29 parents of young children, we delivered personalized stress-reducing micro-interventions through a mobile chatbot. We assessed their impact on stress reduction using momentary stress level ecological momentary assessments (EMAs) before and after each intervention. Our findings demonstrate the superiority of PCAR intervention selection in enhancing the engagement and efficacy of mHealth micro-interventions to stress coping compared to random intervention selection and a control group that did not receive any intervention. Furthermore, we show that even brief, one-minute interventions can significantly reduce perceived stress levels (p=0.001). We observe that individuals are most receptive to one-minute interventions during transitional periods between activities, such as transitioning from afternoon activities to bedtime routines. Our study contributes to the literature by introducing a personalized context-aware intervention selection algorithm that improves engagement and efficacy of mHealth interventions, identifying key timing for stress interventions, and offering insights into mechanisms to improve stress coping.
LGJun 16, 2022
Inherent Inconsistencies of Feature ImportanceNimrod Harel, Uri Obolski, Ran Gilad-Bachrach
The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns scores to the contribution of individual features on prediction outcomes, seeks to bridge this gap as a tool for enhancing human comprehension of these systems. Feature importance serves as an explanation of predictions in diverse contexts, whether by providing a global interpretation of a phenomenon across the entire dataset or by offering a localized explanation for the outcome of a specific data point. Furthermore, feature importance is being used both for explaining models and for identifying plausible causal relations in the data, independently from the model. However, it is worth noting that these various contexts have traditionally been explored in isolation, with limited theoretical foundations. This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores. Notably, our work unveils a surprising conclusion: when we combine the proposed properties with those previously outlined in the literature, we demonstrate the existence of an inconsistency. This inconsistency highlights that certain essential properties of feature importance scores cannot coexist harmoniously within a single framework.
LGFeb 19, 2025
Towards Invariance to Node Identifiers in Graph Neural NetworksMaya Bechler-Speicher, Moshe Eliasof, Carola-Bibiane Schonlieb et al.
Message-Passing Graph Neural Networks (GNNs) are known to have limited expressive power, due to their message passing structure. One mechanism for circumventing this limitation is to add unique node identifiers (IDs), which break the symmetries that underlie the expressivity limitation. In this work, we highlight a key limitation of the ID framework, and propose an approach for addressing it. We begin by observing that the final output of the GNN should clearly not depend on the specific IDs used. We then show that in practice this does not hold, and thus the learned network does not possess this desired structural property. Such invariance to node IDs may be enforced in several ways, and we discuss their theoretical properties. We then propose a novel regularization method that effectively enforces ID invariance to the network. Extensive evaluations on both real-world and synthetic tasks demonstrate that our approach significantly improves ID invariance and, in turn, often boosts generalization performance.
LGMar 3, 2025
Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with TransformersGilad Yehudai, Clayton Sanford, Maya Bechler-Speicher et al.
Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement a task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We support our theoretical results with empirical evaluations.
LGNov 4, 2024
On the Utilization of Unique Node Identifiers in Graph Neural NetworksMaya Bechler-Speicher, Moshe Eliasof, Carola-Bibiane Schönlieb et al.
Graph Neural Networks have inherent representational limitations due to their message-passing structure. Recent work has suggested that these limitations can be overcome by using unique node identifiers (UIDs). Here we argue that despite the advantages of UIDs, one of their disadvantages is that they lose the desirable property of permutation-equivariance. We thus propose to focus on UID models that are permutation-equivariant, and present theoretical arguments for their advantages. Motivated by this, we propose a method to regularize UID models towards permutation equivariance, via a contrastive loss. We empirically demonstrate that our approach improves generalization and extrapolation abilities while providing faster training convergence. On the recent BREC expressiveness benchmark, our proposed method achieves state-of-the-art performance compared to other random-based approaches.
LGFeb 12, 2024
Tighter Bounds on the Information Bottleneck with Application to Deep LearningNir Weingarten, Zohar Yakhini, Moshe Butman et al.
Deep Neural Nets (DNNs) learn latent representations induced by their downstream task, objective function, and other parameters. The quality of the learned representations impacts the DNN's generalization ability and the coherence of the emerging latent space. The Information Bottleneck (IB) provides a hypothetically optimal framework for data modeling, yet it is often intractable. Recent efforts combined DNNs with the IB by applying VAE-inspired variational methods to approximate bounds on mutual information, resulting in improved robustness to adversarial attacks. This work introduces a new and tighter variational bound for the IB, improving performance of previous IB-inspired DNNs. These advancements strengthen the case for the IB and its variational approximations as a data modeling framework, and provide a simple method to significantly enhance the adversarial robustness of classifier DNNs.
LGSep 28, 2025
Graph Mixing Additive NetworksMaya Bechler-Speicher, Andrea Zerio, Maor Huri et al.
We introduce GMAN, a flexible, interpretable, and expressive framework that extends Graph Neural Additive Networks (GNANs) to learn from sets of sparse time-series data. GMAN represents each time-dependent trajectory as a directed graph and applies an enriched, more expressive GNAN to each graph. It allows users to control the interpretability-expressivity trade-off by grouping features and graphs to encode priors, and it provides feature, node, and graph-level interpretability. On real-world datasets, including mortality prediction from blood tests and fake-news detection, GMAN outperforms strong non-interpretable black-box baselines while delivering actionable, domain-aligned explanations.
LGJul 3, 2025
Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical DataIrena Girshovitz, Atai Ambus, Moni Shahar et al.
Background: The use of Electronic Health Records (EHRs) for epidemiological studies and artificial intelligence (AI) training is increasing rapidly. The reliability of the results depends on the accuracy and completeness of EHR data. However, EHR data often contain significant quality issues, including misrepresentations of subpopulations, biases, and systematic errors, as they are primarily collected for clinical and billing purposes. Existing quality assessment methods remain insufficient, lacking systematic procedures to assess data fitness for research. Methods: We present the Medical Data Pecking approach, which adapts unit testing and coverage concepts from software engineering to identify data quality concerns. We demonstrate our approach using the Medical Data Pecking Tool (MDPT), which consists of two main components: (1) an automated test generator that uses large language models and grounding techniques to create a test suite from data and study descriptions, and (2) a data testing framework that executes these tests, reporting potential errors and coverage. Results: We evaluated MDPT on three datasets: All of Us (AoU), MIMIC-III, and SyntheticMass, generating 55-73 tests per cohort across four conditions. These tests correctly identified 20-43 non-aligned or non-conforming data issues. We present a detailed analysis of the LLM-generated test suites in terms of reference grounding and value accuracy. Conclusion: Our approach incorporates external medical knowledge to enable context-sensitive data quality testing as part of the data analysis workflow to improve the validity of its outcomes. Our approach tackles these challenges from a quality assurance perspective, laying the foundation for further development such as additional data modalities and improved grounding methods.
LGMay 25, 2025
Interpretable Graph Learning Over Sets of Temporally-Sparse DataAndrea Zerio, Maya Bechler-Speicher, Maor Huri et al.
Real-world medical data often includes measurements from multiple signals that are collected at irregular and asynchronous time intervals. For example, different types of blood tests can be measured at different times and frequencies, resulting in fragmented and unevenly scattered temporal data. Similar issues of irregular sampling of different attributes occur in other domains, such as monitoring of large systems using event log files or the spread of fake news on social networks. Effectively learning from such data requires models that can handle sets of temporally sparse and heterogeneous signals. In this paper, we propose Graph Mixing Additive Networks (GMAN), a novel and interpretable-by-design model for learning over irregular sets of temporal signals. Our method achieves state-of-the-art performance in real-world medical tasks, including a 4-point increase in the AUROC score of in-hospital mortality prediction, compared to existing methods. We further showcase GMAN's flexibility by applying it to a fake news detection task. We demonstrate how its interpretability capabilities, including node-level, graph-level, and subset-level importance, allow for transition phases detection and gaining medical insights with real-world high-stakes implications. Finally, we provide theoretical insights on GMAN expressive power.
LGJun 3, 2024
The Interpretable and Effective Graph Neural Additive NetworksMaya Bechler-Speicher, Amir Globerson, Ran Gilad-Bachrach
Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel extension of the interpretable class of Generalized Additive Models, and can be visualized and fully understood by humans. GNAN is designed to be fully interpretable, offering both global and local explanations at the feature and graph levels through direct visualization of the model. These visualizations describe exactly how the model uses the relationships between the target variable, the features, and the graph. We demonstrate the intelligibility of GNANs in a series of examples on different tasks and datasets. In addition, we show that the accuracy of GNAN is on par with black-box GNNs, making it suitable for critical applications where transparency is essential, alongside high accuracy.
AIMay 20, 2023
The Case Against ExplainabilityHofit Wasserman Rozen, Niva Elkin-Koren, Ran Gilad-Bachrach
As artificial intelligence (AI) becomes more prevalent there is a growing demand from regulators to accompany decisions made by such systems with explanations. However, a persistent gap exists between the need to execute a meaningful right to explanation vs. the ability of Machine Learning systems to deliver on such a legal requirement. The regulatory appeal towards "a right to explanation" of AI systems can be attributed to the significant role of explanations, part of the notion called reason-giving, in law. Therefore, in this work we examine reason-giving's purposes in law to analyze whether reasons provided by end-user Explainability can adequately fulfill them. We find that reason-giving's legal purposes include: (a) making a better and more just decision, (b) facilitating due-process, (c) authenticating human agency, and (d) enhancing the decision makers' authority. Using this methodology, we demonstrate end-user Explainabilty's inadequacy to fulfil reason-giving's role in law, given reason-giving's functions rely on its impact over a human decision maker. Thus, end-user Explainability fails, or is unsuitable, to fulfil the first, second and third legal function. In contrast we find that end-user Explainability excels in the fourth function, a quality which raises serious risks considering recent end-user Explainability research trends, Large Language Models' capabilities, and the ability to manipulate end-users by both humans and machines. Hence, we suggest that in some cases the right to explanation of AI systems could bring more harm than good to end users. Accordingly, this study carries some important policy ramifications, as it calls upon regulators and Machine Learning practitioners to reconsider the widespread pursuit of end-user Explainability and a right to explanation of AI systems.
HCDec 3, 2021
PyBryt: auto-assessment and auto-grading for computational thinkingChristopher Pyles, Francois van Schalkwyk, Gerard J. Gorman et al.
We continuously interact with computerized systems to achieve goals and perform tasks in our personal and professional lives. Therefore, the ability to program such systems is a skill needed by everyone. Consequently, computational thinking skills are essential for everyone, which creates a challenge for the educational system to teach these skills at scale and allow students to practice these skills. To address this challenge, we present a novel approach to providing formative feedback to students on programming assignments. Our approach uses dynamic evaluation to trace intermediate results generated by student's code and compares them to the reference implementation provided by their teachers. We have implemented this method as a Python library and demonstrate its use to give students relevant feedback on their work while allowing teachers to challenge their students' computational thinking skills.
LGOct 22, 2021
A Last Switch Dependent Analysis of Satiation and Seasonality in BanditsPierre Laforgue, Giulia Clerici, Nicolò Cesa-Bianchi et al.
Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver.
LGMar 13, 2021
Robust Model Compression Using Deep HypothesesOmri Armstrong, Ran Gilad-Bachrach
Machine Learning models should ideally be compact and robust. Compactness provides efficiency and comprehensibility whereas robustness provides resilience. Both topics have been studied in recent years but in isolation. Here we present a robust model compression scheme which is independent of model types: it can compress ensembles, neural networks and other types of models into diverse types of small models. The main building block is the notion of depth derived from robust statistics. Originally, depth was introduced as a measure of the centrality of a point in a sample such that the median is the deepest point. This concept was extended to classification functions which makes it possible to define the depth of a hypothesis and the median hypothesis. Algorithms have been suggested to approximate the median but they have been limited to binary classification. In this study, we present a new algorithm, the Multiclass Empirical Median Optimization (MEMO) algorithm that finds a deep hypothesis in multi-class tasks, and prove its correctness. This leads to our Compact Robust Estimated Median Belief Optimization (CREMBO) algorithm for robust model compression. We demonstrate the success of this algorithm empirically by compressing neural networks and random forests into small decision trees, which are interpretable models, and show that they are more accurate and robust than other comparable methods. In addition, our empirical study shows that our method outperforms Knowledge Distillation on DNN to DNN compression.
LGOct 15, 2020
Marginal Contribution Feature Importance -- an Axiomatic Approach for The Natural CaseAmnon Catav, Boyang Fu, Jason Ernst et al.
When training a predictive model over medical data, the goal is sometimes to gain insights about a certain disease. In such cases, it is common to use feature importance as a tool to highlight significant factors contributing to that disease. As there are many existing methods for computing feature importance scores, understanding their relative merits is not trivial. Further, the diversity of scenarios in which they are used lead to different expectations from the feature importance scores. While it is common to make the distinction between local scores that focus on individual predictions and global scores that look at the contribution of a feature to the model, another important division distinguishes model scenarios, in which the goal is to understand predictions of a given model from natural scenarios, in which the goal is to understand a phenomenon such as a disease. We develop a set of axioms that represent the properties expected from a feature importance function in the natural scenario and prove that there exists only one function that satisfies all of them, the Marginal Contribution Feature Importance (MCI). We analyze this function for its theoretical and empirical properties and compare it to other feature importance scores. While our focus is the natural scenario, we suggest that our axiomatic approach could be carried out in other scenarios too.
IROct 27, 2019
Algorithmic Copywriting: Automated Generation of Health-Related Advertisements to Improve their PerformanceBrit Youngmann, Ran Gilad-Bachrach, Danny Karmon et al.
Search advertising, a popular method for online marketing, has been employed to improve health by eliciting positive behavioral change. However, writing effective advertisements requires expertise and experimentation, which may not be available to health authorities wishing to elicit such changes, especially when dealing with public health crises such as epidemic outbreaks. Here we develop a framework, comprised of two neural networks models, that automatically generate ads. First, it employs a generator model, which create ads from web pages. It then employs a translation model, which transcribes ads to improve performance. We trained the networks using 114K health-related ads shown on Microsoft Advertising. We measure ads performance using the click-through rates (CTR). Our experiments show that the generated advertisements received approximately the same CTR as human-authored ads. The marginal contribution of the generator model was, on average, 28\% lower than that of human-authored ads, while the translator model received, on average, 32\% more clicks than human-authored ads. Our analysis shows that the translator model produces ads reflecting higher values of psychological attributes associated with a user action, including higher valance and arousal, and more calls-to-actions. In contrast, levels of these attributes in ads produced by the generator model are similar to those of human-authored ads. Our results demonstrate the ability to automatically generate useful advertisements for the health domain. We believe that our work offers health authorities an improved ability to nudge people towards healthier behaviors while saving the time and cost needed to build effective advertising campaigns.
HCApr 28, 2019
E-Gotsky: Sequencing Content using the Zone of Proximal DevelopmentOded Vainas, Ori Bar-Ilan, Yossi Ben-David et al.
Vygotsky's notions of Zone of Proximal Development and Dynamic Assessment emphasize the importance of personalized learning that adapts to the needs and abilities of the learners and enables more efficient learning. In this work we introduce a novel adaptive learning engine called E-gostky that builds on these concepts to personalize the learning path within an e-learning system. E-gostky uses machine learning techniques to select the next content item that will challenge the student but will not be overwhelming, keeping students in their Zone of Proximal Development. To evaluate the system, we conducted an experiment where hundreds of students from several different elementary schools used our engine to learn fractions for five months. Our results show that using E-gostky can significantly reduce the time required to reach similar mastery. Specifically, in our experiment, it took students who were using the adaptive learning engine $17\%$ less time to reach a similar level of mastery as of those who didn't. Moreover, students made greater efforts to find the correct answer rather than guessing and class teachers reported that even students with learning disabilities showed higher engagement.
LGDec 27, 2018
Low Latency Privacy Preserving InferenceAlon Brutzkus, Oren Elisha, Ran Gilad-Bachrach
When applying machine learning to sensitive data, one has to find a balance between accuracy, information security, and computational-complexity. Recent studies combined Homomorphic Encryption with neural networks to make inferences while protecting against information leakage. However, these methods are limited by the width and depth of neural networks that can be used (and hence the accuracy) and exhibit high latency even for relatively simple networks. In this study we provide two solutions that address these limitations. In the first solution, we present more than $10\times$ improvement in latency and enable inference on wider networks compared to prior attempts with the same level of security. The improved performance is achieved by novel methods to represent the data during the computation. In the second solution, we apply the method of transfer learning to provide private inference services using deep networks with latency of $\sim0.16$ seconds. We demonstrate the efficacy of our methods on several computer vision tasks.
GTOct 4, 2018
Turning Lemons into Peaches using Secure ComputationStav Buchsbaum, Ran Gilad-Bachrach, Yehuda Lindell
In many cases, assessing the quality of goods is hard. For example, when purchasing a car, it is hard to measure how pollutant the car is since there are infinitely many driving conditions to be tested. Typically, these situations are considered under the umbrella of information asymmetry and as Akelrof showed may lead to a market of lemons. However, we argue that in many of these situations, the problem is not the missing information but the computational challenge of obtaining it. In a nut-shell, if verifying the value of goods requires a large amount of computation or even infinite amounts of computation, the buyer is forced to use a finite test that samples, in some sense, the quality of the goods. However, if the seller knows the test, then the seller can over-fit the test and create goods that pass the quality test despite not having the desired quality. We show different solutions to this situation including a novel approach that uses secure computation to hide the test from the seller to prevent over-fitting.
LGApr 18, 2018
Modeling and Simultaneously Removing Bias via Adversarial Neural NetworksJohn Moore, Joel Pfeiffer, Kai Wei et al.
In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions. In this work, we develop a novel Adversarial Neural Network (ANN) model, an alternative approach which creates a representation of the data that is invariant to the bias. We take the Paid Search auction as our working example and ad display position features as the confounding features for this setting. We show the success of this approach empirically on both synthetic data as well as real world paid search auction data from a major search engine.
LGOct 29, 2017
Smooth Sensitivity Based Approach for Differentially Private Principal Component AnalysisRan Gilad-Bachrach, Alon Gonen
Currently known methods for this task either employ the computationally intensive \emph{exponential mechanism} or require an access to the covariance matrix, and therefore fail to utilize potential sparsity of the data. The problem of designing simpler and more efficient methods for this task has been raised as an open problem in \cite{kapralov2013differentially}. In this paper we address this problem by employing the output perturbation mechanism. Despite being arguably the simplest and most straightforward technique, it has been overlooked due to the large \emph{global sensitivity} associated with publishing the leading eigenvector. We tackle this issue by adopting a \emph{smooth sensitivity} based approach, which allows us to establish differential privacy (in a worst-case manner) and near-optimal sample complexity results under eigengap assumption. We consider both the pure and the approximate notions of differential privacy, and demonstrate a tradeoff between privacy level and sample complexity. We conclude by suggesting how our results can be extended to related problems.
LGMay 7, 2015
DART: Dropouts meet Multiple Additive Regression TreesK. V. Rashmi, Ran Gilad-Bachrach
Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent.
LGDec 18, 2014
Crypto-Nets: Neural Networks over Encrypted DataPengtao Xie, Misha Bilenko, Tom Finley et al.
The problem we address is the following: how can a user employ a predictive model that is held by a third party, without compromising private information. For example, a hospital may wish to use a cloud service to predict the readmission risk of a patient. However, due to regulations, the patient's medical files cannot be revealed. The goal is to make an inference using the model, without jeopardizing the accuracy of the prediction or the privacy of the data. To achieve high accuracy, we use neural networks, which have been shown to outperform other learning models for many tasks. To achieve the privacy requirements, we use homomorphic encryption in the following protocol: the data owner encrypts the data and sends the ciphertexts to the third party to obtain a prediction from a trained model. The model operates on these ciphertexts and sends back the encrypted prediction. In this protocol, not only the data remains private, even the values predicted are available only to the data owner. Using homomorphic encryption and modifications to the activation functions and training algorithms of neural networks, we show that it is protocol is possible and may be feasible. This method paves the way to build a secure cloud-based neural network prediction services without invading users' privacy.
MLNov 28, 2013
Using Multiple Samples to Learn Mixture ModelsJason D Lee, Ran Gilad-Bachrach, Rich Caruana
In the mixture models problem it is assumed that there are $K$ distributions $θ_{1},\ldots,θ_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. As with topic modeling, having multiple samples is often a reasonable assumption. Instead of pooling the data into one sample, we prove that it is possible to use the differences between the samples to better recover the underlying structure. We present algorithms that recover the underlying structure under milder assumptions than the current state of art when either the dimensionality or the separation is high. The methods, when applied to topic modeling, allow generalization to words not present in the training data.