Bubacarr Bah

15papers

282citations

Novelty51%

AI Score30

Ranked #148,267 of 205,806 authors (top 72%)#639 in IT (top 71%)

15 Papers

ITMar 19, 2010

Improved Bounds on Restricted Isometry Constants for Gaussian Matrices

Bubacarr Bah, Jared Tanner

The Restricted Isometry Constants (RIC) of a matrix $A$ measures how close to an isometry is the action of $A$ on vectors with few nonzero entries, measured in the $\ell^2$ norm. Specifically, the upper and lower RIC of a matrix $A$ of size $n\times N$ is the maximum and the minimum deviation from unity (one) of the largest and smallest, respectively, square of singular values of all ${N\choose k}$ matrices formed by taking $k$ columns from $A$. Calculation of the RIC is intractable for most matrices due to its combinatorial nature; however, many random matrices typically have bounded RIC in some range of problem sizes $(k,n,N)$. We provide the best known bound on the RIC for Gaussian matrices, which is also the smallest known bound on the RIC for any large rectangular matrix. Improvements over prior bounds are achieved by exploiting similarity of singular values for matrices which share a substantial number of columns.

NAApr 9, 2016

The sample complexity of weighted sparse approximation

Bubacarr Bah, Rachel Ward

For Gaussian sampling matrices, we provide bounds on the minimal number of measurements $m$ required to achieve robust weighted sparse recovery guarantees in terms of how well a given prior model for the sparsity support aligns with the true underlying support. Our main contribution is that for a sparse vector ${\bf x} \in \mathbb{R}^N$ supported on an unknown set $\mathcal{S} \subset \{1, \dots, N\}$ with $|\mathcal{S}|\leq k$, if $\mathcal{S}$ has \emph{weighted cardinality} $ω(\mathcal{S}) := \sum_{j \in \mathcal{S}} ω_j^2$, and if the weights on $\mathcal{S}^c$ exhibit mild growth, $ω_j^2 \geq γ\log(j/ω(\mathcal{S}))$ for $j\in\mathcal{S}^c$ and $γ> 0$, then the sample complexity for sparse recovery via weighted $\ell_1$-minimization using weights $ω_j$ is linear in the weighted sparsity level, and $m = \mathcal{O}(ω(\mathcal{S})/γ)$. This main result is a generalization of special cases including a) the standard sparse recovery setting where all weights $ω_j \equiv 1$, and $m = \mathcal{O}\left(k\log\left(N/k\right)\right)$; b) the setting where the support is known a priori, and $m = \mathcal{O}(k)$; and c) the setting of sparse recovery with prior information, and $m$ depends on how well the weights are aligned with the support set $\mathcal{S}$. We further extend the results in case c) to the setting of additive noise. Our results are {\em nonuniform} that is they apply for a fixed support, unknown a priori, and the weights on $\mathcal{S}$ do not all have to be smaller than the weights on $\mathcal{S}^c$ for our recovery results to hold.

ITJul 15, 2013

Vanishingly Sparse Matrices and Expander Graphs, With Application to Compressed Sensing

Bubacarr Bah, Jared Tanner

We revisit the probabilistic construction of sparse random matrices where each column has a fixed number of nonzeros whose row indices are drawn uniformly at random with replacement. These matrices have a one-to-one correspondence with the adjacency matrices of fixed left degree expander graphs. We present formulae for the expected cardinality of the set of neighbors for these graphs, and present tail bounds on the probability that this cardinality will be less than the expected value. Deducible from these bounds are similar bounds for the expansion of the graph which is of interest in many applications. These bounds are derived through a more detailed analysis of collisions in unions of sets. Key to this analysis is a novel {\em dyadic splitting} technique. The analysis led to the derivation of better order constants that allow for quantitative theorems on existence of lossless expander graphs and hence the sparse random matrices we consider and also quantitative compressed sensing sampling theorems when using sparse non mean-zero measurement matrices.

NAJul 15, 2013

Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices

Bubacarr Bah, Jared Tanner

Restricted Isometry Constants (RICs) provide a measure of how far from an isometry a matrix can be when acting on sparse vectors. This, and related quantities, provide a mechanism by which standard eigen-analysis can be applied to topics relying on sparsity. RIC bounds have been presented for a variety of random matrices and matrix dimension and sparsity ranges. We provide explicitly formulae for RIC bounds, of n by N Gaussian matrices with sparsity k, in three settings: a) n/N fixed and k/n approaching zero, b) k/n fixed and n/N approaching zero, and c) n/N approaching zero with k/n decaying inverse logrithmically in N/n; in these three settings the RICs a) decay to zero, b) become unbounded (or approach inherent bounds), and c) approach a non-zero constant. Implications of these results for RIC based analysis of compressed sensing algorithms are presented.

LGApr 7, 2023

A physics-informed neural network framework for modeling obstacle-related equations

Hamid El Bahja, Jan Christian Hauffen, Peter Jung et al.

Deep learning has been highly successful in some applications. Nevertheless, its use for solving partial differential equations (PDEs) has only been of recent interest with current state-of-the-art machine learning libraries, e.g., TensorFlow or PyTorch. Physics-informed neural networks (PINNs) are an attractive tool for solving partial differential equations based on sparse and noisy data. Here extend PINNs to solve obstacle-related PDEs which present a great computational challenge because they necessitate numerical methods that can yield an accurate approximation of the solution that lies above a given obstacle. The performance of the proposed PINNs is demonstrated in multiple scenarios for linear and nonlinear PDEs subject to regular and irregular obstacles.

MLNov 22, 2023

Improved identification accuracy in equation learning via comprehensive $\boldsymbol{R^2}$-elimination and Bayesian model selection

Daniel Nickelsen, Bubacarr Bah

In the field of equation learning, exhaustively considering all possible equations derived from a basis function dictionary is infeasible. Sparse regression and greedy algorithms have emerged as popular approaches to tackle this challenge. However, the presence of multicollinearity poses difficulties for sparse regression techniques, and greedy steps may inadvertently exclude terms of the true equation, leading to reduced identification accuracy. In this article, we present an approach that strikes a balance between comprehensiveness and efficiency in equation learning. Inspired by stepwise regression, our approach combines the coefficient of determination, $R^2$, and the Bayesian model evidence, $p(\boldsymbol y|\mathcal M)$, in a novel way. Our procedure is characterized by a comprehensive search with just a minor reduction of the model space at each iteration step. With two flavors of our approach and the adoption of $p(\boldsymbol y|\mathcal M)$ for bi-directional stepwise regression, we present a total of three new avenues for equation learning. Through three extensive numerical experiments involving random polynomials and dynamical systems, we compare our approach against four state-of-the-art methods and two standard approaches. The results demonstrate that our comprehensive search approach surpasses all other methods in terms of identification accuracy. In particular, the second flavor of our approach establishes an efficient overfitting penalty solely based on $R^2$, which achieves highest rates of exact equation recovery.

NAMay 9, 2016

Sparse matrices for weighted sparse recovery

Bubacarr Bah

We derived the first sparse recovery guarantees for weighted $\ell_1$ minimization with sparse random matrices and the class of weighted sparse signals, using a weighted versions of the null space property to derive these guarantees. These sparse matrices from expender graphs can be applied very fast and have other better computational complexities than their dense counterparts. In addition we show that, using such sparse matrices, weighted sparse recovery with weighted $\ell_1$ minimization leads to sample complexities that are linear in the weighted sparsity of the signal and these sampling rates can be smaller than those of standard sparse recovery. Moreover, these results reduce to known results in standard sparse recovery and sparse recovery with prior information and the results are supported by numerical experiments.

AIJun 21, 2024Code

This actually looks like that: Proto-BagNets for local and global interpretability-by-design

Kerol Djoumessi, Bubacarr Bah, Laura Kühlewein et al.

Interpretability is a key requirement for the use of machine learning models in high-stakes applications, including medical diagnosis. Explaining black-box models mostly relies on post-hoc methods that do not faithfully reflect the model's behavior. As a remedy, prototype-based networks have been proposed, but their interpretability is limited as they have been shown to provide coarse, unreliable, and imprecise explanations. In this work, we introduce Proto-BagNets, an interpretable-by-design prototype-based model that combines the advantages of bag-of-local feature models and prototype learning to provide meaningful, coherent, and relevant prototypical parts needed for accurate and interpretable image classification tasks. We evaluated the Proto-BagNet for drusen detection on publicly available retinal OCT data. The Proto-BagNet performed comparably to the state-of-the-art interpretable and non-interpretable models while providing faithful, accurate, and clinically meaningful local and global explanations. The code is available at https://github.com/kdjoumessi/Proto-BagNets.

OCOct 21, 2021

Efficient and Robust Mixed-Integer Optimization Methods for Training Binarized Deep Neural Networks

Jannis Kurtz, Bubacarr Bah

Compared to classical deep neural networks its binarized versions can be useful for applications on resource-limited devices due to their reduction in memory consumption and computational demands. In this work we study deep neural networks with binary activation functions and continuous or integer weights (BDNN). We show that the BDNN can be reformulated as a mixed-integer linear program with bounded weight space which can be solved to global optimality by classical mixed-integer programming solvers. Additionally, a local search heuristic is presented to calculate locally optimal networks. Furthermore to improve efficiency we present an iterative data-splitting heuristic which iteratively splits the training set into smaller subsets by using the k-mean method. Afterwards all data points in a given subset are forced to follow the same activation pattern, which leads to a much smaller number of integer variables in the mixed-integer programming formulation and therefore to computational improvements. Finally for the first time a robust model is presented which enforces robustness of the BDNN during training. All methods are tested on random and real datasets and our results indicate that all models can often compete with or even outperform classical DNNs on small network architectures confirming the viability for applications having restricted memory or computing power.

CVDec 21, 2020

Towards the Localisation of Lesions in Diabetic Retinopathy

Samuel Ofosu Mensah, Bubacarr Bah, Willie Brink

Convolutional Neural Networks (CNNs) have successfully been used to classify diabetic retinopathy (DR) fundus images in recent times. However, deeper representations in CNNs may capture higher-level semantics at the expense of spatial resolution. To make predictions usable for ophthalmologists, we use a post-attention technique called Gradient-weighted Class Activation Mapping (Grad-CAM) on the penultimate layer of deep learning models to produce coarse localisation maps on DR fundus images. This is to help identify discriminative regions in the images, consequently providing evidence for ophthalmologists to make a diagnosis and potentially save lives by early diagnosis. Specifically, this study uses pre-trained weights from four state-of-the-art deep learning models to produce and compare localisation maps of DR fundus images. The models used include VGG16, ResNet50, InceptionV3, and InceptionResNetV2. We find that InceptionV3 achieves the best performance with a test classification accuracy of 96.07%, and localise lesions better and faster than the other models.

OCJul 7, 2020

An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions

Bubacarr Bah, Jannis Kurtz

We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model under data uncertainty, applying a two-stage robust optimization approach. We implemented our methods on random and real datasets and show that the heuristic version of the BDNN outperforms classical deep neural networks on the Breast Cancer Wisconsin dataset while performing worse on random data.

LGApr 11, 2020

On Error Correction Neural Networks for Economic Forecasting

Mhlasakululeka Mvubu, Emmanuel Kabuga, Christian Plitz et al.

Recurrent neural networks (RNNs) are more suitable for learning non-linear dependencies in dynamical systems from observed time series data. In practice all the external variables driving such systems are not known a priori, especially in economical forecasting. A class of RNNs called Error Correction Neural Networks (ECNNs) was designed to compensate for missing input variables. It does this by feeding back in the current step the error made in the previous step. The ECNN is implemented in Python by the computation of the appropriate gradients and it is tested on stock market predictions. As expected it out performed the simple RNN and LSTM and other hybrid models which involve a de-noising pre-processing step. The intuition for the latter is that de-noising may lead to loss of information.

OCOct 12, 2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege et al.

We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.

ITMar 21, 2016

Convex block-sparse linear regression with expanders -- provably

Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad et al.

Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution lives in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.

LGJul 12, 2013

Energy-aware adaptive bi-Lipschitz embeddings

Bubacarr Bah, Ali Sadeghian, Volkan Cevher

We propose a dimensionality reducing matrix design based on training data with constraints on its Frobenius norm and number of rows. Our design criteria is aimed at preserving the distances between the data points in the dimensionality reduced space as much as possible relative to their distances in original data space. This approach can be considered as a deterministic Bi-Lipschitz embedding of the data points. We introduce a scalable learning algorithm, dubbed AMUSE, and provide a rigorous estimation guarantee by leveraging game theoretic tools. We also provide a generalization characterization of our matrix based on our sample data. We use compressive sensing problems as an example application of our problem, where the Frobenius norm design constraint translates into the sensing energy.