AO-PHSep 20, 2024
Learning to Simulate Aerosol Dynamics with Graph Neural NetworksFabiana Ferracina, Payton Beeler, Mahantesh Halappanavar et al.
Aerosol effects on climate, weather, and air quality depend on characteristics of individual particles, which are tremendously diverse and change in time. Particle-resolved models are the only models able to capture this diversity in particle physiochemical properties, and these models are computationally expensive. As a strategy for accelerating particle-resolved microphysics models, we introduce Graph-based Learning of Aerosol Dynamics (GLAD) and use this model to train a surrogate of the particle-resolved model PartMC-MOSAIC. GLAD implements a Graph Network-based Simulator (GNS), a machine learning framework that has been used to simulate particle-based fluid dynamics models. In GLAD, each particle is represented as a node in a graph, and the evolution of the particle population over time is simulated through learned message passing. We demonstrate our GNS approach on a simple aerosol system that includes condensation of sulfuric acid onto particles composed of sulfate, black carbon, organic carbon, and water. A graph with particles as nodes is constructed, and a graph neural network (GNN) is then trained using the model output from PartMC-MOSAIC. The trained GNN can then be used for simulating and predicting aerosol dynamics over time. Results demonstrate the framework's ability to accurately learn chemical dynamics and generalize across different scenarios, achieving efficient training and prediction times. We evaluate the performance across three scenarios, highlighting the framework's robustness and adaptability in modeling aerosol microphysics and chemistry.
AIDec 16, 2025
Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster ConvergenceShreyas Subramanian, Bala Krishnamoorthy, Pranav Murthy
Despite significant advances in optimizers for training, most research works use common scheduler choices like Cosine or exponential decay. In this paper, we study \emph{GreedyLR}, a novel scheduler that adaptively adjusts the learning rate during training based on the current loss. To validate the effectiveness of our proposed scheduler, we conduct experiments on several NLP, CV, and LLM tasks with up to $7B$ parameters, including both fine-tuning and pre-training experiments. The results show that our approach outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. We also provide a theoretical analysis of the GreedyLR algorithm, including a proof of convergence and derivation of the optimal scaling factor $F$ that maximizes the convergence rate, along with experiments to show robustness of the algorithm to realistic noisy landscapes. Our scheduler is easy to implement, computationally efficient, and could be considered a good default scheduler for training.
CGJun 11, 2023
A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension ReductionNathan H. May, Bala Krishnamoorthy, Patrick Gambill
Persistence diagrams (PDs) are used as signatures of point cloud data. Two clouds of points can be compared using the bottleneck distance d_B between their PDs. A potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B when there is a large scaling between them. This situation is typical in dimension reduction frameworks. We define, and study properties of, a new scale-invariant distance between PDs termed normalized bottleneck distance, d_N. In defining d_N, we develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X))) < e, where dgm(X) is the Vietoris-Rips PD of X, and pairwise distances are preserved by f up to the tolerance 0 < ε< 1. For mMDS, we present new bounds for d_B and d_N between PDs of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X). Finally, we use computational experiments to demonstrate the increased effectiveness of using the normalized bottleneck distance for clustering sets of point clouds sampled from different shapes.
MLDec 23, 2025
Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual WeightsRommel Cortez, Bala Krishnamoorthy
Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new weighted Pearson-Matthews Correlation Coefficient (MCC) for binary classification as well as weighted versions of related multiclass measures. The weighted MCC varies between $-1$ and $1$. But crucially, the weighted MCC values are higher for classifiers that perform better on highly weighted observations, and hence is able to distinguish them from classifiers that have a similar overall performance and ones that perform better on the lowly weighted observations. Furthermore, we prove that the weighted measures are robust with respect to the choice of weights in a precise manner: if the weights are changed by at most $ε$, the value of the weighted measure changes at most by a factor of $ε$ in the binary case and by a factor of $ε^2$ in the multiclass case. Our computations demonstrate that the weighted measures clearly identify classifiers that perform better on higher weighted observations, while the unweighted measures remain completely indifferent to the choices of weights.
LGApr 4, 2024
Predictive Analytics of Varieties of PotatoesFabiana Ferracina, Bala Krishnamoorthy, Mahantesh Halappanavar et al.
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials by predicting their suitability for advancement. This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties that meet processing industry standards. Leveraging manually collected data from trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. The dataset includes 1086 clones, with data on 38 attributes recorded for each clone, focusing on yield, size, appearance, and frying characteristics, with several control varieties planted consistently across four Oregon regions from 2013-2021. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely a neural network classifier (Neural Net), histogram-based gradient boosting classifier (HGBC), and a support vector machine classifier (SVM), demonstrate consistent and significant results. To further validate our findings, we conduct a simulation study. By simulating different data-generating scenarios, we assess model robustness and performance through true positive, true negative, false positive, and false negative distributions, area under the receiver operating characteristic curve (AUC-ROC) and MCC. The simulation results highlight that non-linear models like SVM and HGBC consistently show higher AUC-ROC and MCC than logistic regression (LR), thus outperforming the traditional linear model across various distributions, and emphasizing the importance of model selection and tuning in agricultural trials.
CGJun 25, 2021
Pheno-Mapper: An Interactive Toolbox for the Visual Exploration of Phenomics DataYoujia Zhou, Methun Kamruzzaman, Patrick Schnable et al.
High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of key scientific knowledge on the dynamics of the underlying biological system. However, extracting information and insights from these complex datasets remains a significant challenge owing to their multidimensionality and lack of prior knowledge about their complex structure. In this paper, we present Pheno-Mapper, an interactive toolbox for the exploratory analysis and visualization of large-scale phenomics data. Our approach uses the mapper framework to perform a topological analysis of the data, and subsequently render visual representations with built-in data analysis and machine learning capabilities. We demonstrate the utility of this new tool on real-world plant (e.g., maize) phenomics datasets. In comparison to existing approaches, the main advantage of Pheno-Mapper is that it provides rich, interactive capabilities in the exploratory analysis of phenomics data, and it integrates visual analytics with data analysis and machine learning in an easily extensible way. In particular, Pheno-Mapper allows the interactive selection of subpopulations guided by a topological summary of the data and applies data mining and machine learning to these selected subpopulations for in-depth exploration.
CGMay 5, 2021
Stitch Fix for Mapper and Topological GainsYoujia Zhou, Nathaniel Saul, Ilkin Safarli et al.
The mapper construction is a powerful tool from topological data analysis that is designed for the analysis and visualization of multivariate data. In this paper, we investigate a method for stitching a pair of univariate mappers together into a bivariate mapper, and study topological notions of information gains, referred to as topological gains, during such a process. We further provide implementations that visualize such topological gains for mapper graphs.
CGAug 19, 2019
Continuous Toolpath Planning in Additive ManufacturingPrashant Gupta, Bala Krishnamoorthy, Gregory Dreifus
We develop a framework that creates a new polygonal mesh representation of the sparse infill domain of a layer-by-layer 3D printing job. We guarantee the existence of a single, continuous tool path covering each connected piece of the domain in every layer. We present a tool path algorithm that traverses each such continuous tool path with no crossovers. The key construction at the heart of our framework is an Euler transformation which converts a 2-dimensional cell complex K into a new 2-complex K^ such that every vertex in the 1-skeleton G^ of K^ has even degree. Hence G^ is Eulerian, and a Eulerian tour can be followed to print all edges in a continuous fashion. We start with a mesh K of the union of polygons obtained by projecting all layers to the plane. We compute its Euler transformation K^. In the slicing step, we clip K^ at each layer using its polygon to obtain a complex that may not necessarily be Euler. We then patch this complex by adding edges such that any odd-degree nodes created by slicing are transformed to have even degrees again. We print extra support edges in place of any segments left out to ensure there are no edges without support in the next layer. These support edges maintain the Euler nature of the complex. Finally we describe a tree-based search algorithm that builds the continuous tool path by traversing "concentric" cycles in the Euler complex. Our algorithm produces a tool path that avoids material collisions and crossovers, and can be printed in a continuous fashion irrespective of complex geometry or topology of the domain (e.g., holes). We implement our test our framework on several 3D objects. Apart from standard geometric shapes, we demonstrate the framework on the Stanford bunny.
LGJun 19, 2019
Steinhaus Filtration and Stable Paths in the MapperDustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy et al.
We define a new filtration called the Steinhaus filtration built from a single cover based on a generalized Steinhaus distance, a generalization of Jaccard distance. The homology persistence module of a Steinhaus filtration with infinitely many cover elements may not be $q$-tame, even when the covers are in a totally bounded space. While this may pose a challenge to derive stability results, we show that the Steinhaus filtration is stable when the cover is finite. We show that while the Čech and Steinhaus filtrations are not isomorphic in general, they are isomorphic for a finite point set in dimension one. Furthermore, the VR filtration completely determines the $1$-skeleton of the Steinhaus filtration in arbitrary dimension. We then develop a language and theory for stable paths within the Steinhaus filtration. We demonstrate how the framework can be applied to several applications where a standard metric may not be defined but a cover is readily available. We introduce a new perspective for modeling recommendation system datasets. As an example, we look at a movies dataset and we find the stable paths identified in our framework represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper algorithm for model induction by building a filtration from a single Mapper complex, and provide explanations in the form of stable paths between subpopulations. For illustration, we build a Mapper complex from a supervised machine learning model trained on the FashionMNIST dataset. Stable paths in the Steinhaus filtration provide improved explanations of relationships between subpopulations of images.
NTMar 13, 2015
A Knapsack-Like Code Using Recurrence Sequence RepresentationsNathan Hamlin, Bala Krishnamoorthy, William Webb
We had recently shown that every positive integer can be represented uniquely using a recurrence sequence, when certain restrictions on the digit strings are satisfied. We present the details of how such representations can be used to build a knapsack-like public key cryptosystem. We also present new disguising methods, and provide arguments for the security of the code against known methods of attack.