Matt Raymond

LG
h-index46
3papers
2citations
Novelty43%
AI Score19

3 Papers

LGMar 21, 2024
Universal Feature Selection for Simultaneous Interpretability of Multitask Datasets

Matt Raymond, Jacob Charles Saldinger, Paolo Elvati et al.

Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations to identify both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and suggest deep connections in seemingly-disparate chemical datasets. We expect these results to have important repercussions in manually-guided inverse problems. Beyond its current application, BoUTS holds immense potential for elucidating data-poor systems by leveraging information from similar data-rich systems. BoUTS represents a significant leap in cross-domain feature selection, potentially leading to advancements in various scientific fields.

COMP-PHOct 31, 2024
Machine learning models for Si nanoparticle growth in nonthermal plasma

Matt Raymond, Paolo Elvati, Jacob C. Saldinger et al.

Nanoparticles (NPs) formed in nonthermal plasmas (NTPs) can have unique properties and applications. However, modeling their growth in these environments presents significant challenges due to the non-equilibrium nature of NTPs, making them computationally expensive to describe. In this work, we address the challenges associated with accelerating the estimation of parameters needed for these models. Specifically, we explore how different machine learning models can be tailored to improve prediction outcomes. We apply these methods to reactive classical molecular dynamics data, which capture the processes associated with colliding silane fragments in NTPs. These reactions exemplify processes where qualitative trends are clear, but their quantification is challenging, hard to generalize, and requires time-consuming simulations. Our results demonstrate that good prediction performance can be achieved when appropriate loss functions are implemented and correct invariances are imposed. While the diversity of molecules used in the training set is critical for accurate prediction, our findings indicate that only a fraction (15-25\%) of the energy and temperature sampling is required to achieve high levels of accuracy. This suggests a substantial reduction in computational effort is possible for similar systems.

LGMay 1, 2024
Joint Optimization of Piecewise Linear Ensembles

Matt Raymond, Angela Violi, Clayton Scott

Tree ensembles achieve state-of-the-art performance on numerous prediction tasks. We propose $\textbf{J}$oint $\textbf{O}$ptimization of $\textbf{P}$iecewise $\textbf{L}$inear $\textbf{En}$sembles (JOPLEn), which jointly fits piecewise linear models at all leaf nodes of an existing tree ensemble. In addition to enhancing the ensemble expressiveness, JOPLEn allows several common penalties, including sparsity-promoting and subspace-norms, to be applied to nonlinear prediction. For example, JOPLEn with a nuclear norm penalty learns subspace-aligned functions. Additionally, JOPLEn (combined with a Dirty LASSO penalty) is an effective feature selection method for nonlinear prediction in multitask learning. Finally, we demonstrate the performance of JOPLEn on 153 regression and classification datasets and with a variety of penalties. JOPLEn leads to improved prediction performance relative to not only standard random forest and boosted tree ensembles, but also other methods for enhancing tree ensembles.