LGApr 3
Low-Rank Compression of Pretrained Models via Randomized Subspace IterationFarhad Pourkamali-Anaraki
The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.
LGSep 16, 2024
Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer PerceptronsFarhad Pourkamali-Anaraki
Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning, known for their capacity to model complex relationships. Recently, Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative, utilizing highly flexible learnable activation functions directly on network edges, a departure from the neuron-centric approach of MLPs. However, KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. This paper presents a comprehensive comparative study of MLPs and KANs from both algorithmic and experimental perspectives, with a focus on low-data regimes. We introduce an effective technique for designing MLPs with unique, parameterized activation functions for each neuron, enabling a more balanced comparison with KANs. Using empirical evaluations on simulated data and two real-world data sets from medicine and engineering, we explore the trade-offs between model complexity and accuracy, with particular attention to the role of network depth. Our findings show that MLPs with individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters, especially when the sample size is limited to around one hundred. For example, in a three-class classification problem within additive manufacturing, MLPs achieve a median accuracy of 0.91, significantly outperforming KANs, which only reach a median accuracy of 0.53 with default hyperparameters. These results offer valuable insights into the impact of activation function selection in neural networks.
LGJul 13, 2022
D-CBRS: Accounting For Intra-Class Diversity in Continual LearningYasin Findik, Farhad Pourkamali-Anaraki
Continual learning -- accumulating knowledge from a sequence of learning experiences -- is an important yet challenging problem. In this paradigm, the model's performance for previously encountered instances may substantially drop as additional data are seen. When dealing with class-imbalanced data, forgetting is further exacerbated. Prior work has proposed replay-based approaches which aim at reducing forgetting by intelligently storing instances for future replay. Although Class-Balancing Reservoir Sampling (CBRS) has been successful in dealing with imbalanced data, the intra-class diversity has not been accounted for, implicitly assuming that each instance of a class is equally informative. We present Diverse-CBRS (D-CBRS), an algorithm that allows us to consider within class diversity when storing instances in the memory. Our results show that D-CBRS outperforms state-of-the-art memory management continual learning algorithms on data sets with considerable intra-class diversity.
LGFeb 8, 2024
Adaptive Activation Functions for Predictive Modeling with Sparse Experimental DataFarhad Pourkamali-Anaraki, Tahamina Nasrin, Robert E. Jensen et al.
A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.
LGJan 4, 2024
Two-Stage Surrogate Modeling for Data-Driven Design Optimization with Application to Composite Microstructure GenerationFarhad Pourkamali-Anaraki, Jamal F. Husseini, Evan J. Pineda et al.
This paper introduces a novel two-stage machine learning-based surrogate modeling framework to address inverse problems in scientific and engineering fields. In the first stage of the proposed framework, a machine learning model termed the "learner" identifies a limited set of candidates within the input design space whose predicted outputs closely align with desired outcomes. Subsequently, in the second stage, a separate surrogate model, functioning as an "evaluator," is employed to assess the reduced candidate space generated in the first stage. This evaluation process eliminates inaccurate and uncertain solutions, guided by a user-defined coverage level. The framework's distinctive contribution is the integration of conformal inference, providing a versatile and efficient approach that can be widely applicable. To demonstrate the effectiveness of the proposed framework compared to conventional single-stage inverse problems, we conduct several benchmark tests and investigate an engineering application focused on the micromechanical modeling of fiber-reinforced composites. The results affirm the superiority of our proposed framework, as it consistently produces more reliable solutions. Therefore, the introduced framework offers a unique perspective on fostering interactions between machine learning-based surrogate models in real-world applications.
MLFeb 21, 2024
Probabilistic Neural Networks (PNNs) for Modeling Aleatoric Uncertainty in Scientific Machine LearningFarhad Pourkamali-Anaraki, Jamal F. Husseini, Scott E. Stapleton
This paper investigates the use of probabilistic neural networks (PNNs) to model aleatoric uncertainty, which refers to the inherent variability in the input-output relationships of a system, often characterized by unequal variance or heteroscedasticity. Unlike traditional neural networks that produce deterministic outputs, PNNs generate probability distributions for the target variable, allowing the determination of both predicted means and intervals in regression scenarios. Contributions of this paper include the development of a probabilistic distance metric to optimize PNN architecture, and the deployment of PNNs in controlled data sets as well as a practical material science case involving fiber-reinforced composites. The findings confirm that PNNs effectively model aleatoric uncertainty, proving to be more appropriate than the commonly employed Gaussian process regression for this purpose. Specifically, in a real-world scientific machine learning context, PNNs yield remarkably accurate output mean estimates with R-squared scores approaching 0.97, and their predicted intervals exhibit a high correlation coefficient of nearly 0.80, closely matching observed data intervals. Hence, this research contributes to the ongoing exploration of leveraging the sophisticated representational capacity of neural networks to delineate complex input-output relationships in scientific problems.
MLApr 7
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective ClassificationCourtney Franzen, Farhad Pourkamali-Anaraki
Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.
LGMar 16, 2025
Probabilistic Neural Networks (PNNs) with t-Distributed Outputs: Adaptive Prediction Intervals Beyond Gaussian AssumptionsFarhad Pourkamali-Anaraki
Traditional neural network regression models provide only point estimates, failing to capture predictive uncertainty. Probabilistic neural networks (PNNs) address this limitation by producing output distributions, enabling the construction of prediction intervals. However, the common assumption of Gaussian output distributions often results in overly wide intervals, particularly in the presence of outliers or deviations from normality. To enhance the adaptability of PNNs, we propose t-Distributed Neural Networks (TDistNNs), which generate t-distributed outputs, parameterized by location, scale, and degrees of freedom. The degrees of freedom parameter allows TDistNNs to model heavy-tailed predictive distributions, improving robustness to non-Gaussian data and enabling more adaptive uncertainty quantification. We develop a novel loss function tailored for the t-distribution and derive efficient gradient computations for seamless integration into deep learning frameworks. Empirical evaluations on synthetic and real-world data demonstrate that TDistNNs improve the balance between coverage and interval width. Notably, for identical architectures, TDistNNs consistently produce narrower prediction intervals than Gaussian-based PNNs while maintaining proper coverage. This work contributes a flexible framework for uncertainty estimation in neural networks tasked with regression, particularly suited to settings involving complex output distributions.
LGApr 24, 2025
Aerial Image Classification in Scarce and Unconstrained Environments via Conformal PredictionFarhad Pourkamali-Anaraki
This paper presents a comprehensive empirical analysis of conformal prediction methods on a challenging aerial image dataset featuring diverse events in unconstrained environments. Conformal prediction is a powerful post-hoc technique that takes the output of any classifier and transforms it into a set of likely labels, providing a statistical guarantee on the coverage of the true label. Unlike evaluations on standard benchmarks, our study addresses the complexities of data-scarce and highly variable real-world settings. We investigate the effectiveness of leveraging pretrained models (MobileNet, DenseNet, and ResNet), fine-tuned with limited labeled data, to generate informative prediction sets. To further evaluate the impact of calibration, we consider two parallel pipelines (with and without temperature scaling) and assess performance using two key metrics: empirical coverage and average prediction set size. This setup allows us to systematically examine how calibration choices influence the trade-off between reliability and efficiency. Our findings demonstrate that even with relatively small labeled samples and simple nonconformity scores, conformal prediction can yield valuable uncertainty estimates for complex tasks. Moreover, our analysis reveals that while temperature scaling is often employed for calibration, it does not consistently lead to smaller prediction sets, underscoring the importance of careful consideration in its application. Furthermore, our results highlight the significant potential of model compression techniques within the conformal prediction pipeline for deployment in resource-constrained environments. Based on our observations, we advocate for future research to delve into the impact of noisy or ambiguous labels on conformal prediction performance and to explore effective model reduction strategies.
LGSep 18, 2021
An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural EngineeringParisa Hajibabaee, Farhad Pourkamali-Anaraki, Mohammad Amin Hariri-Ardebili
A fundamental task in machine learning involves visualizing high-dimensional data sets that arise in high-impact application domains. When considering the context of large imbalanced data, this problem becomes much more challenging. In this paper, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake engineering related data set for visualization purposes. Since imbalanced data sets greatly affect the accuracy of classifiers, we employ Synthetic Minority Oversampling Technique (SMOTE) to tackle the imbalanced nature of such data set. We present the result obtained from t-SNE and SMOTE and compare it to the basic approaches with various aspects. Considering four options and six classification algorithms, we show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy. Hence, we can transform the studied scientific data into a two-dimensional (2D) space, enabling the visualization of the classifier and the resulting decision surface using a 2D plot.
LGSep 19, 2020
Kernel Ridge Regression Using Importance Sampling with Application to Seismic Response PredictionFarhad Pourkamali-Anaraki, Mohammad Amin Hariri-Ardebili, Lydia Morawiec
Scalable kernel methods, including kernel ridge regression, often rely on low-rank matrix approximations using the Nystrom method, which involves selecting landmark points from large data sets. The existing approaches to selecting landmarks are typically computationally demanding as they require manipulating and performing computations with large matrices in the input or feature space. In this paper, our contribution is twofold. The first contribution is to propose a novel landmark selection method that promotes diversity using an efficient two-step approach. Our landmark selection technique follows a coarse to fine strategy, where the first step computes importance scores with a single pass over the whole data. The second step performs K-means clustering on the constructed coreset to use the obtained centroids as landmarks. Hence, the introduced method provides tunable trade-offs between accuracy and efficiency. Our second contribution is to investigate the performance of several landmark selection techniques using a novel application of kernel methods for predicting structural responses due to earthquake load and material uncertainties. Our experiments exhibit the merits of our proposed landmark selection scheme against baselines.
ROJul 31, 2020
A Unified NMPC Scheme for MAVs Navigation with 3D Collision Avoidance under Position UncertaintySina Sharif Mansouri, Christoforos Kanellakis, Bjorn Lindqvist et al.
This article proposes a novel Nonlinear Model Predictive Control (NMPC) framework for Micro Aerial Vehicle (MAV) autonomous navigation in constrained environments. The introduced framework allows us to consider the nonlinear dynamics of MAVs and guarantees real-time performance. Our first contribution is to design a computationally efficient subspace clustering method to reveal from geometrical constraints to underlying constraint planes within a 3D point cloud, obtained from a 3D lidar scanner. The second contribution of our work is to incorporate the extracted information into the nonlinear constraints of NMPC for avoiding collisions. Our third contribution focuses on making the controller robust by considering the uncertainty of localization and NMPC using the Shannon entropy. This step enables us to track either the position or velocity references, or none of them if necessary. As a result, the collision avoidance constraints are defined in the local coordinates of MAVs and it remains active and guarantees collision avoidance, despite localization uncertainties, e.g., position estimation drifts. Additionally, as the platform continues the mission, this will result in less uncertain position estimations, due to the feature extraction and loop closure. The efficacy of the suggested framework has been evaluated using various simulations in the Gazebo environment.
LGJun 25, 2020
Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical AspectsFarhad Pourkamali-Anaraki
Spectral clustering techniques are valuable tools in signal processing and machine learning for partitioning complex data sets. The effectiveness of spectral clustering stems from constructing a non-linear embedding based on creating a similarity graph and computing the spectral decomposition of the Laplacian matrix. However, spectral clustering methods fail to scale to large data sets because of high computational cost and memory usage. A popular approach for addressing these problems utilizes the Nystrom method, an efficient sampling-based algorithm for computing low-rank approximations to large positive semi-definite matrices. This paper demonstrates how the previously popular approach of Nystrom-based spectral clustering has severe limitations. Existing time-efficient methods ignore critical information by prematurely reducing the rank of the similarity matrix associated with sampled points. Also, current understanding is limited regarding how utilizing the Nystrom approximation will affect the quality of spectral embedding approximations. To address the limitations, this work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs. We provide theoretical results to reduce the current gap and present numerical experiments with real and synthetic data. Empirical results demonstrate the efficacy and efficiency of the proposed method compared to existing spectral clustering techniques based on the Nystrom method and other efficient methods. The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.
ROJun 7, 2020
Unsupervised Learning for Subterranean Junction Recognition Based on 2D Point CloudSina Sharif Mansouri, Farhad Pourkamali-Anaraki, Miguel Castano Arranz et al.
This article proposes a novel unsupervised learning framework for detecting the number of tunnel junctions in subterranean environments based on acquired 2D point clouds. The implementation of the framework provides valuable information for high level mission planners to navigate an aerial platform in unknown areas or robot homing missions. The framework utilizes spectral clustering, which is capable of uncovering hidden structures from connected data points lying on non-linear manifolds. The spectral clustering algorithm computes a spectral embedding of the original 2D point cloud by utilizing the eigen decomposition of a matrix that is derived from the pairwise similarities of these points. We validate the developed framework using multiple data-sets, collected from multiple realistic simulations, as well as from real flights in underground environments, demonstrating the performance and merits of the proposed methodology.
LGNov 18, 2019
The Effectiveness of Variational Autoencoders for Active LearningFarhad Pourkamali-Anaraki, Michael B. Wakin
The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting labeled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.
LGAug 2, 2019
Large-Scale Sparse Subspace Clustering Using LandmarksFarhad Pourkamali-Anaraki
Subspace clustering methods based on expressing each data point as a linear combination of all other points in a dataset are popular unsupervised learning techniques. However, existing methods incur high computational complexity on large-scale datasets as they require solving an expensive optimization problem and performing spectral clustering on large affinity matrices. This paper presents an efficient approach to subspace clustering by selecting a small subset of the input data called landmarks. The resulting subspace clustering method in the reduced domain runs in linear time with respect to the size of the original data. Numerical experiments on synthetic and real data demonstrate the effectiveness of our method.
CVApr 17, 2018
Efficient Solvers for Sparse Subspace ClusteringFarhad Pourkamali-Anaraki, James Folberth, Stephen Becker
Sparse subspace clustering (SSC) clusters $n$ points that lie near a union of low-dimensional subspaces. The SSC model expresses each point as a linear or affine combination of the other points, using either $\ell_1$ or $\ell_0$ regularization. Using $\ell_1$ regularization results in a convex problem but requires $O(n^2)$ storage, and is typically solved by the alternating direction method of multipliers which takes $O(n^3)$ flops. The $\ell_0$ model is non-convex but only needs memory linear in $n$, and is solved via orthogonal matching pursuit and cannot handle the case of affine subspaces. This paper shows that a proximal gradient framework can solve SSC, covering both $\ell_1$ and $\ell_0$ models, and both linear and affine constraints. For both $\ell_1$ and $\ell_0$, algorithms to compute the proximity operator in the presence of affine constraints have not been presented in the SSC literature, so we derive an exact and efficient algorithm that solves the $\ell_1$ case with just $O(n^2)$ flops. In the $\ell_0$ case, our algorithm retains the low-memory overhead, and is the first algorithm to solve the SSC-$\ell_0$ model with affine constraints. Experiments show our algorithms do not rely on sensitive regularization parameters, and they are less sensitive to sparsity misspecification and high noise.
MLAug 8, 2017
Improved Fixed-Rank Nyström Approximation via QR Decomposition: Practical and Theoretical AspectsFarhad Pourkamali-Anaraki, Stephen Becker
The Nystrom method is a popular technique that uses a small number of landmark points to compute a fixed-rank approximation of large kernel matrices that arise in machine learning problems. In practice, to ensure high quality approximations, the number of landmark points is chosen to be greater than the target rank. However, for simplicity the standard Nystrom method uses a sub-optimal procedure for rank reduction. In this paper, we examine the drawbacks of the standard Nystrom method in terms of poor performance and lack of theoretical guarantees. To address these issues, we present an efficient modification for generating improved fixed-rank Nystrom approximations. Theoretical analysis and numerical experiments are provided to demonstrate the advantages of the modified method over the standard Nystrom method. Overall, the aim of this paper is to convince researchers to use the modified method, as it has nearly identical computational complexity, is easy to code, has greatly improved accuracy in many cases, and is optimal in a sense that we make precise.
MLDec 20, 2016
Randomized Clustered Nystrom for Large-Scale Kernel MachinesFarhad Pourkamali-Anaraki, Stephen Becker
The Nystrom method has been popular for generating the low-rank approximation of kernel matrices that arise in many machine learning problems. The approximation quality of the Nystrom method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we present a novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank. Moreover, we introduce a randomized algorithm for generating landmark points that is scalable to large-scale data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and, thus, leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between the accuracy and efficiency of our proposed method. Extensive experiments demonstrate the competitive performance as well as the efficiency of our proposed method.
MLAug 26, 2016
A Randomized Approach to Efficient Kernel ClusteringFarhad Pourkamali-Anaraki, Stephen Becker
Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations.
MLDec 30, 2015
Estimation of the sample covariance matrix from compressive measurementsFarhad Pourkamali-Anaraki
This paper focuses on the estimation of the sample covariance matrix from low-dimensional random projections of data known as compressive measurements. In particular, we present an unbiased estimator to extract the covariance structure from compressive measurements obtained by a general class of random projection matrices consisting of i.i.d. zero-mean entries and finite first four moments. In contrast to previous works, we make no structural assumptions about the underlying covariance matrix such as being low-rank. In fact, our analysis is based on a non-Bayesian data setting which requires no distributional assumptions on the set of data samples. Furthermore, inspired by the generality of the projection matrices, we propose an approach to covariance estimation that utilizes sparse Rademacher matrices. Therefore, our algorithm can be used to estimate the covariance matrix in applications with limited memory and computation power at the acquisition devices. Experimental results demonstrate that our approach allows for accurate estimation of the sample covariance matrix on several real-world data sets, including video data.
MLOct 31, 2015
Preconditioned Data Sparsification for Big Data with Applications to PCA and K-meansFarhad Pourkamali-Anaraki, Stephen Becker
We analyze a compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample. The benefit is that the output is a sparse matrix and therefore subsequent processing, such as PCA or K-means, is significantly faster, especially in a distributed-data setting. Furthermore, the sampling is single-pass and applicable to streaming data. The sampling mechanism is a variant of previous methods proposed in the literature combined with a randomized preconditioning to smooth the data. We provide guarantees for PCA in terms of the covariance matrix, and guarantees for K-means in terms of the error in the center estimators at a given step. We present numerical evidence to show both that our bounds are nearly tight and that our algorithms provide a real benefit when applied to standard test data sets, as well as providing certain benefits over related sampling approaches.
MLApr 5, 2015
Efficient Dictionary Learning via Very Sparse Random ProjectionsFarhad Pourkamali-Anaraki, Stephen Becker, Shannon M. Hughes
Performing signal processing tasks on compressive measurements of data has received great attention in recent years. In this paper, we extend previous work on compressive dictionary learning by showing that more general random projections may be used, including sparse ones. More precisely, we examine compressive K-means clustering as a special case of compressive dictionary learning and give theoretical guarantees for its performance for a very general class of random projections. We then propose a memory and computation efficient dictionary learning algorithm, specifically designed for analyzing large volumes of high-dimensional data, which learns the dictionary from very sparse random projections. Experimental results demonstrate that our approach allows for reduction of computational complexity and memory/data access, with controllable loss in accuracy.