MLOct 1, 2021
Weight Vector Tuning and Asymptotic Analysis of Binary Linear ClassifiersLama B. Niyazi, Abla Kammoun, Hayssam Dahrouj et al.
Unlike its intercept, a linear classifier's weight vector cannot be tuned by a simple grid search. Hence, this paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar which controls the trade-off between conflicting informative and noisy terms. By varying this parameter, the original weight vector is modified in a meaningful way. Applying this method to a number of linear classifiers under a variety of data dimensionality and sample size settings reveals that the classification performance loss due to non-optimal native hyperparameters can be compensated for by weight vector tuning. This yields computational savings as the proposed tuning method reduces to tuning a scalar compared to tuning the native hyperparameter, which may involve repeated weight vector generation along with its burden of optimization, dimensionality reduction, etc., depending on the classifier. It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise. Proceeding from this second finding, an asymptotic study of the misclassification probability of the parameterized LDA classifier in the growth regime where the data dimensionality and sample size are comparable is conducted. Using random matrix theory, the misclassification probability is shown to converge to a quantity that is a function of the true statistics of the data. Additionally, an estimator of the misclassification probability is derived. Finally, computationally efficient tuning of the parameter using this estimator is demonstrated on real data.
LGMay 21, 2021
A Precise Performance Analysis of Support Vector RegressionHoussem Sifaou, Abla kammoun, Mohamed-Slim Alouini
In this paper, we study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements of the form $y_i=\boldsymbolβ_\star^{T}{\bf x}_i +n_i$ where $\boldsymbolβ_\star$ is an unknown vector, $\left\{{\bf x}_i\right\}_{i=1}^n$ are the feature vectors and $\left\{{n}_i\right\}_{i=1}^n$ model the noise. Particularly, under some plausible assumptions on the statistical distribution of the data, we characterize the feasibility condition for the hard support vector regression in the regime of high dimensions and, when feasible, derive an asymptotic approximation for its risk. Similarly, we study the test risk for the soft support vector regression as a function of its parameters. Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms. Based on our analysis, we illustrate that adding more samples may be harmful to the test performance of support vector regression, while it is always beneficial when the parameters are optimally selected. Such a result reminds a similar phenomenon observed in modern learning architectures according to which optimally tuned architectures present a decreasing test performance curve with respect to the number of samples.
LGJun 25, 2020
High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance ModelHoussem Sifaou, Abla Kammoun, Mohamed-Slim Alouini
Quadratic discriminant analysis (QDA) is a widely used classification technique that generalizes the linear discriminant analysis (LDA) classifier to the case of distinct covariance matrices among classes. For the QDA classifier to yield high classification performance, an accurate estimation of the covariance matrices is required. Such a task becomes all the more challenging in high dimensional settings, wherein the number of observations is comparable with the feature dimension. A popular way to enhance the performance of QDA classifier under these circumstances is to regularize the covariance matrix, giving the name regularized QDA (R-QDA) to the corresponding classifier. In this work, we consider the case in which the population covariance matrix has a spiked covariance structure, a model that is often assumed in several applications. Building on the classical QDA, we propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized. Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity, making it suitable to high dimensional settings.
MLJun 11, 2020
Improved Design of Quadratic Discriminant Analysis Classifier in Unbalanced SettingsAmine Bejaoui, Khalil Elkhalil, Abla Kammoun et al.
The use of quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended, due to its well-acknowledged high sensitivity to the estimation noise of the covariance matrix. This becomes all the more the case in unbalanced data settings for which it has been found that R-QDA becomes equivalent to the classifier that assigns all observations to the same class. In this paper, we propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias, properly chosen to avoid inappropriate behaviors of R-QDA in unbalanced settings and to ensure the best possible classification performance. The design of the proposed classifier builds on a refined asymptotic analysis of its performance when the number of samples and that of features grow large simultaneously, which allows to cope efficiently with the high-dimensionality frequently met within the big data paradigm. The performance of the proposed classifier is assessed on both real and synthetic data sets and was shown to be much better than what one would expect from a traditional R-QDA.
MLApr 17, 2020
Asymptotic Analysis of an Ensemble of Randomly Projected Linear DiscriminantsLama B. Niyazi, Abla Kammoun, Hayssam Dahrouj et al.
Datasets from the fields of bioinformatics, chemometrics, and face recognition are typically characterized by small samples of high-dimensional data. Among the many variants of linear discriminant analysis that have been proposed in order to rectify the issues associated with classification in such a setting, the classifier in [1], composed of an ensemble of randomly projected linear discriminants, seems especially promising; it is computationally efficient and, with the optimal projection dimension parameter setting, is competitive with the state-of-the-art. In this work, we seek to further understand the behavior of this classifier through asymptotic analysis. Under the assumption of a growth regime in which the dataset and projection dimensions grow at constant rates to each other, we use random matrix theory to derive asymptotic misclassification probabilities showing the effect of the ensemble as a regularization of the data sample covariance matrix. The asymptotic errors further help to identify situations in which the ensemble offers a performance advantage. We also develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator, which is conventionally used for parameter tuning. Finally, we demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
CRFeb 6, 2020
Data-Driven False Data Injection Attacks Against Power Grids: A Random Matrix ApproachSubhash Lakshminarayana, Abla Kammoun, Merouane Debbah et al.
We address the problem of constructing false data injection (FDI) attacks that can bypass the bad data detector (BDD) of a power grid. The attacker is assumed to have access to only power flow measurement data traces (collected over a limited period of time) and no other prior knowledge about the grid. Existing related algorithms are formulated under the assumption that the attacker has access to measurements collected over a long (asymptotically infinite) time period, which may not be realistic. We show that these approaches do not perform well when the attacker has a limited number of data samples only. We design an enhanced algorithm to construct FDI attack vectors in the face of limited measurements that can nevertheless bypass the BDD with high probability. The algorithm design is guided by results from random matrix theory. Furthermore, we characterize an important trade-off between the attack's BDD-bypass probability and its sparsity, which affects the spatial extent of the attack that must be achieved. Extensive simulations using data traces collected from the MATPOWER simulator and benchmark IEEE bus systems validate our findings.
MLNov 13, 2019
A Model of Double Descent for High-dimensional Binary Linear ClassificationZeyu Deng, Abla Kammoun, Christos Thrampoulidis
We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient descent (GD) on logistic loss. For this model, we investigate the dependence of the classification error on the overparameterization ratio $κ=p/n$. First, building on known deterministic results on the implicit bias of GD, we uncover a phase-transition phenomenon for the case of Gaussian features: the classification error of GD is the same as that of the maximum-likelihood (ML) solution when $κ<κ_\star$, and that of the max-margin (SVM) solution when $κ>κ_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and the SVM solutions. Combining these results, we obtain curves that explicitly characterize the classification error for varying values of $κ$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent findings in linear regression settings as well as empirical observations in more complex learning scenarios.
MLApr 19, 2019
Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional DataKhalil Elkhalil, Abla Kammoun, Xiangliang Zhang et al.
This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call \emph{centered kernel ridge regression} (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by accounting for the bias in the regression problem resulting in the old kernel ridge regression but with \emph{centered} kernels. The analysis is carried out under the assumption that the data is drawn from a Gaussian distribution and heavily relies on tools from random matrix theory (RMT). Under the regime in which the data dimension and the training size grow infinitely large with fixed ratio and under some mild assumptions controlling the data statistics, we show that both the empirical and the prediction risks converge to a deterministic quantities that describe in closed form fashion the performance of CKRR in terms of the data statistics and dimensions. Inspired by this theoretical result, we subsequently build a consistent estimator of the prediction risk based on the training data which allows to optimally tune the design parameters. A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk. This insight is validated with both synthetic and real data.
MLNov 1, 2017
A Large Dimensional Study of Regularized Discriminant Analysis ClassifiersKhalil Elkhalil, Abla Kammoun, Romain Couillet et al.
This article carries out a large dimensional analysis of standard regularized discriminant analysis classifiers designed on the assumption that data arise from a Gaussian mixture model with different means and covariances. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under mild assumptions, we show that the asymptotic classification error approaches a deterministic quantity that depends only on the means and covariances associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized discriminant analsysis, in practical large but finite dimensions, and can be used to determine and pre-estimate the optimal regularization parameter that minimizes the misclassification error probability. Despite being theoretically valid only for Gaussian data, our findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from the popular USPS data base, thereby making an interesting connection between theory and practice.
ITMay 4, 2015
On the Feedback Reduction of Relay Aided Multiuser Networks using Compressive SensingKhalil M. Elkhalil, Mohammed E. Eltayeb, Abla Kammoun et al.
In this paper, we propose a feedback reduction scheme for full-duplex relay-aided multiuser networks. The proposed scheme permits the base station (BS) to obtain channel state information (CSI) from a subset of strong users under substantially reduced feedback overhead. More specifically, we cast the problem of user identification and CSI estimation as a block sparse signal recovery problem in compressive sensing (CS). Using existing CS block recovery algorithms, we first obtain the identity of the strong users and then estimate their CSI using the best linear unbiased estimator (BLUE). To minimize the effect of noise on the estimated CSI, we introduce a back-off strategy that optimally backs-off on the noisy estimated CSI and derive the error covariance matrix of the post-detection noise. In addition to this, we provide exact closed form expressions for the average maximum equivalent SNR at the destination user. Numerical results show that the proposed algorithm drastically reduces the feedback air-time and achieves a rate close to that obtained by scheduling schemes that require dedicated error-free feedback from all the network users.