MLMay 31, 2022
Optimal Activation Functions for the Random Features Regression ModelJianxin Wang, José Bento
The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.
QMAug 22, 2019Code
Exact inference under the perfect phylogeny modelSurjyendu Ray, Bei Jia, Sam Safavi et al.
Motivation: Many inference tools use the Perfect Phylogeny Model (PPM) to learn trees from noisy variant allele frequency (VAF) data. Learning in this setting is hard, and existing tools use approximate or heuristic algorithms. An algorithmic improvement is important to help disentangle the limitations of the PPM's assumptions from the limitations in our capacity to learn under it. Results: We make such improvement in the scenario, where the mutations that are relevant for evolution can be clustered into a small number of groups, and the trees to be reconstructed have a small number of nodes. We use a careful combination of algorithms, software, and hardware, to develop EXACT: a tool that can explore the space of all possible phylogenetic trees, and performs exact inference under the PPM with noisy data. EXACT allows users to obtain not just the most-likely tree for some input data, but exact statistics about the distribution of trees that might explain the data. We show that EXACT outperforms several existing tools for this same task. Availability: https://github.com/surjray-repos/EXACT
OCSep 5, 2020
Distributed Optimization, Averaging via ADMM, and Network TopologyGuilherme França, José Bento
There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms such as gradient descent the dependency of the convergence time with the topology of this network is well-known. However, for more involved algorithms such as the Alternating Direction Methods of Multipliers (ADMM) much less is known. At the heart of many distributed optimization algorithms there exists a gossip subroutine which averages local information over the network, and whose efficiency is crucial for the overall performance of the method. In this paper we review recent research in this area and, with the goal of isolating such a communication exchange behaviour, we compare different algorithms when applied to a canonical distributed averaging consensus problem. We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence and optimal parameter tuning in terms of spectral properties of the network. Finally, we empirically study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
LGJan 29, 2020
A Family of Pairwise Multi-Marginal Optimal Transports that Define a Generalized MetricLiang Mi, Azadeh Sheikholeslami, José Bento
The Optimal transport (OT) problem is rapidly finding its way into machine learning. Favoring its use are its metric properties. Many problems admit solutions with guarantees only for objects embedded in metric spaces, and the use of non-metrics can complicate solving them. Multi-marginal OT (MMOT) generalizes OT to simultaneously transporting multiple distributions. It captures important relations that are missed if the transport only involves two distributions. Research on MMOT, however, has been focused on its existence, uniqueness, practical algorithms, and the choice of cost functions. There is a lack of discussion on the metric properties of MMOT, which limits its theoretical and practical use. Here, we prove new generalized metric properties for a family of pairwise MMOTs. We first explain the difficulty of proving this via two negative results. Afterward, we prove the MMOTs' metric properties. Finally, we show that the generalized triangle inequality of this family of MMOTs cannot be improved. We illustrate the superiority of our MMOTs over other generalized metrics, and over non-metrics in both synthetic and real tasks.
QMJul 11, 2018
Estimating Cellular Goals from High-Dimensional Biological DataLaurence Yang, Michael A. Saunders, Jean-Christophe Lachance et al.
Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macro-molecular composition of cells, and by measuring the cell's growth rate and metabolism in different conditions. The cellular goal (the optimization problem that the cell is trying to solve) can be challenging to derive experimentally for many organisms, including human or mammalian cells, which have complex metabolic capabilities and are not well understood. Existing approaches to learning goals from data include (a) estimating a linear objective function, or (b) estimating linear constraints that model complex biochemical reactions and constrain the cell's operation. The latter approach is important because often the known/observed biochemical reactions are not enough to explain observations, and hence there is a need to extend automatically the model complexity by learning new chemical reactions. However, this leads to nonconvex optimization problems, and existing tools cannot scale to realistically large metabolic models. Hence, constraint estimation is still used sparingly despite its benefits for modeling cell metabolism, which is important for developing novel antimicrobials against pathogens, discovering cancer drug targets, and producing value-added chemicals. Here, we develop the first approach to estimating constraint reactions from data that can scale to realistically large metabolic models. Previous tools have been used on problems having less than 75 biochemical reactions and 60 metabolites, which limits real-life-size applications. We perform extensive experiments using 75 large-scale metabolic network models for different organisms (including bacteria, yeasts, and mammals) and show that our algorithm can recover cellular constraint reactions, even when some measurements are missing.
DMJul 9, 2018
Tractable $n$-Metrics for Multiple GraphsSam Safavi, José Bento
Graphs are used in almost every scientific discipline to express relations among a set of objects. Algorithms that compare graphs, and output a closeness score, or a correspondence among their nodes, are thus extremely important. Despite the large amount of work done, many of the scalable algorithms to compare graphs do not produce closeness scores that satisfy the intuitive properties of metrics. This is problematic since non-metrics are known to degrade the performance of algorithms such as distance-based clustering of graphs (Stratis and Bento 2018). On the other hand, the use of metrics increases the performance of several machine learning tasks (Indyk et al. 1999, Clarkson et al. 1999, Angiulli et al. 2002, Ackermann et al. 2010). In this paper, we introduce a new family of multi-distances (a distance between more than two elements) that satisfies a generalization of the properties of metrics to multiple elements. In the context of comparing graphs, we are the first to show the existence of multi-distances that simultaneously incorporate the useful property of alignment consistency (Nguyen et al. 2011), and a generalized metric property. Furthermore, we show that these multi-distances can be relaxed to convex optimization problems, without losing the generalized metric property.
OCJan 13, 2018
An Explicit Convergence Rate for Nesterov's Method from SDPSam Safavi, Bikash Joshi, Guilherme França et al.
The framework of Integral Quadratic Constraints (IQC) introduced by Lessard et al. (2014) reduces the computation of upper bounds on the convergence rate of several optimization algorithms to semi-definite programming (SDP). In particular, this technique was applied to Nesterov's accelerated method (NAM). For quadratic functions, this SDP was explicitly solved leading to a new bound on the convergence rate of NAM, and for arbitrary strongly convex functions it was shown numerically that IQC can improve bounds from Nesterov (2004). Unfortunately, an explicit analytic solution to the SDP was not provided. In this paper, we provide such an analytical solution, obtaining a new general and explicit upper bound on the convergence rate of NAM, which we further optimize over its parameters. To the best of our knowledge, this is the best, and explicit, upper bound on the convergence rate of NAM for strongly convex functions.
MLOct 2, 2017
How is Distributed ADMM Affected by Network Topology?Guilherme França, José Bento
When solving consensus optimization problems over a graph, there is often an explicit characterization of the convergence rate of Gradient Descent (GD) using the spectrum of the graph Laplacian. The same type of problems under the Alternating Direction Method of Multipliers (ADMM) are, however, poorly understood. For instance, simple but important non-strongly-convex consensus problems have not yet being analyzed, especially concerning the dependency of the convergence rate on the graph topology. Recently, for a non-strongly-convex consensus problem, a connection between distributed ADMM and lifted Markov chains was proposed, followed by a conjecture that ADMM is faster than GD by a square root factor in its convergence time, in close analogy to the mixing speedup achieved by lifting several Markov chains. Nevertheless, a proof of such a claim is is still lacking. Here we provide a full characterization of the convergence of distributed over-relaxed ADMM for the same type of consensus problem in terms of the topology of the underlying graph. Our results provide explicit formulas for optimal parameter selection in terms of the second largest eigenvalue of the transition matrix of the graph's random walk. Another consequence of our results is a proof of the aforementioned conjecture, which interestingly, we show it is valid for any graph, even the ones whose random walks cannot be accelerated via Markov chain lifting.
MLMar 10, 2017
Tuning Over-Relaxed ADMMGuilherme França, José Bento
The framework of Integral Quadratic Constraints (IQC) reduces the computation of upper bounds on the convergence rate of several optimization algorithms to a semi-definite program (SDP). In the case of over-relaxed Alternating Direction Method of Multipliers (ADMM), an explicit and closed form solution to this SDP was derived in our recent work [1]. The purpose of this paper is twofold. First, we summarize these results. Second, we explore one of its consequences which allows us to obtain general and simple formulas for optimal parameter selection. These results are valid for arbitrary strongly convex objective functions.
MLMar 10, 2017
Markov Chain Lifting and Distributed ADMMGuilherme França, José Bento
The time to converge to the steady state of a finite Markov chain can be greatly reduced by a lifting operation, which creates a new Markov chain on an expanded state space. For a class of quadratic objectives, we show an analogous behavior where a distributed ADMM algorithm can be seen as a lifting of Gradient Descent algorithm. This provides a deep insight for its faster convergence rate under optimal parameter tuning. We conjecture that this gain is always present, as opposed to the lifting of a Markov chain which sometimes only provides a marginal speedup.
LGFeb 25, 2017
Generative Adversarial Active LearningJia-Jie Zhu, José Bento
We propose a new active learning by query synthesis approach using Generative Adversarial Networks (GAN). Different from regular active learning, the resulting algorithm adaptively synthesizes training instances for querying to increase learning speed. We generate queries according to the uncertainty principle, but our idea can work with other active learning principles. We report results from various numerical experiments to demonstrate the effectiveness the proposed approach. In some settings, the proposed algorithm outperforms traditional pool-based approaches. To the best our knowledge, this is the first active learning work using GAN.
CVJan 12, 2016
A metric for sets of trajectories that is practical and mathematically consistentJosé Bento, Jia Jie Zhu
Metrics on the space of sets of trajectories are important for scientists in the field of computer vision, machine learning, robotics, and general artificial intelligence. However, existing notions of closeness between sets of trajectories are either mathematically inconsistent or of limited practical use. In this paper, we outline the limitations in the current mathematically-consistent metrics, which are based on OSPA (Schuhmacher et al. 2008); and the inconsistencies in the heuristic notions of closeness used in practice, whose main ideas are common to the CLEAR MOT measures (Keni and Rainer 2008) widely used in computer vision. In two steps, we then propose a new intuitive metric between sets of trajectories and address these limitations. First, we explain a solution that leads to a metric that is hard to compute. Then we modify this formulation to obtain a metric that is easy to compute while keeping the useful properties of the previous metric. Our notion of closeness is the first demonstrating the following three features: the metric 1) can be quickly computed, 2) incorporates confusion of trajectories' identity in an optimal way, and 3) is a metric in the mathematical sense.
MLDec 7, 2015
An Explicit Rate Bound for the Over-Relaxed ADMMGuilherme França, José Bento
The framework of Integral Quadratic Constraints of Lessard et al. (2014) reduces the computation of upper bounds on the convergence rate of several optimization algorithms to semi-definite programming (SDP). Followup work by Nishihara et al. (2015) applies this technique to the entire family of over-relaxed Alternating Direction Method of Multipliers (ADMM). Unfortunately, they only provide an explicit error bound for sufficiently large values of some of the parameters of the problem, leaving the computation for the general case as a numerical optimization problem. In this paper we provide an exact analytical solution to this SDP and obtain a general and explicit upper bound on the convergence rate of the entire family of over-relaxed ADMM. Furthermore, we demonstrate that it is not possible to extract from this SDP a general bound better than ours. We end with a few numerical illustrations of our result and a comparison between the convergence rate we obtain for the ADMM with known convergence rates for the Gradient Descent.
LGMay 12, 2015
The Boundary Forest Algorithm for Online Supervised and Unsupervised LearningCharles Mathy, Nate Derbinsky, José Bento et al.
We describe a new instance-based learning algorithm called the Boundary Forest (BF) algorithm, that can be used for supervised and unsupervised learning. The algorithm builds a forest of trees whose nodes store previously seen examples. It can be shown data points one at a time and updates itself incrementally, hence it is naturally online. Few instance-based algorithms have this property while being simultaneously fast, which the BF is. This is crucial for applications where one needs to respond to input data in real time. The number of children of each node is not set beforehand but obtained from the training procedure, which makes the algorithm very flexible with regards to what data manifolds it can learn. We test its generalization performance and speed on a range of benchmark datasets and detail in which settings it outperforms the state of the art. Empirically we find that training time scales as O(DNlog(N)) and testing as O(Dlog(N)), where D is the dimensionality and N the amount of data,
ROApr 7, 2015
Proximal operators for multi-agent path planningJosé Bento, Nate Derbinsky, Charles Mathy et al.
We address the problem of planning collision-free paths for multiple agents using optimization methods known as proximal algorithms. Recently this approach was explored in Bento et al. 2013, which demonstrated its ease of parallelization and decentralization, the speed with which the algorithms generate good quality solutions, and its ability to incorporate different proximal operators, each ensuring that paths satisfy a desired property. Unfortunately, the operators derived only apply to paths in 2D and require that any intermediate waypoints we might want agents to follow be preassigned to specific agents, limiting their range of applicability. In this paper we resolve these limitations. We introduce new operators to deal with agents moving in arbitrary dimensions that are faster to compute than their 2D predecessors and we introduce landmarks, space-time positions that are automatically assigned to the set of agents under different optimality criteria. Finally, we report the performance of the new operators in several numerical experiments.
AINov 16, 2013
Methods for Integrating Knowledge with the Three-Weight Optimization Algorithm for Hybrid Cognitive ProcessingNate Derbinsky, José Bento, Jonathan S. Yedidia
In this paper we consider optimization as an approach for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and can exploit knowledge to improve solution speed and quality. In this context, we focus on the Three-Weight Algorithm, which aims to solve general optimization problems. We propose novel methods by which to integrate knowledge with this algorithm to improve expressiveness, efficiency, and scaling, and demonstrate these techniques on two example problems (Sudoku and circle packing).
AIMay 8, 2013
An Improved Three-Weight Message-Passing AlgorithmNate Derbinsky, José Bento, Veit Elser et al.
We describe how the powerful "Divide and Concur" algorithm for constraint satisfaction can be derived as a special case of a message-passing version of the Alternating Direction Method of Multipliers (ADMM) algorithm for convex optimization, and introduce an improved message-passing algorithm based on ADMM/DC by introducing three distinct weights for messages, with "certain" and "no opinion" weights, as well as the standard weight used in ADMM/DC. The "certain" messages allow our improved algorithm to implement constraint propagation as a special case, while the "no opinion" messages speed convergence for some problems by making the algorithm focus only on active constraints. We describe how our three-weight version of ADMM/DC can give greatly improved performance for non-convex problems such as circle packing and solving large Sudoku puzzles, while retaining the exact performance of ADMM for convex problems. We also describe the advantages of our algorithm compared to other message-passing algorithms based upon belief propagation.
IRJul 26, 2012
Identifying Users From Their Rating PatternsJosé Bento, Nadia Fawaz, Andrea Montanari et al.
This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household groupings of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The challenge required to identify the user labels for the ratings in the test set. Our main finding is that temporal information (time labels of the ratings) is significantly more useful for achieving this objective than the user preferences (the actual ratings). Using a model that leverages on this fact, we are able to identify users within a known household with an accuracy of approximately 96% (i.e. misclassification rate around 4%).