Jonathan Gryak

LG
h-index1
9papers
87citations
Novelty33%
AI Score39

9 Papers

LGAug 17, 2022
Prediction of Oral Food Challenge Outcomes via Ensemble Learning

Justin Zhang, Deborah Lee, Kylie Jungles et al.

Oral Food Challenges (OFCs) are essential to accurately diagnosing food allergy due to the limitations of existing clinical testing. However, some patients are hesitant to undergo OFCs, while those willing suffer from limited access to allergists in rural/community healthcare settings. Despite its success in predicting patient outcomes in other clinical settings, few applications of machine learning to food allergy have been developed. Thus, in this study, we seek to leverage machine learning methodologies for OFC outcome prediction. Retrospective data was gathered from 1,112 patients who collectively underwent a total of 1,284 OFCs, and consisted of clinical factors including serum-specific Immunoglobulin E (IgE), total IgE, skin prick tests (SPTs), comorbidities, sex, and age. Using these features, multiple machine learning models were constructed to predict OFC outcomes for three common allergens: peanut, egg, and milk. The best performing model for each allergen was an ensemble of random forest (egg) or Learning Using Concave and Convex Kernels (LUCCK) (peanut, milk) models, which achieved an Area under the Curve (AUC) of 0.91, 0.96, and 0.94, in predicting OFC outcomes for peanut, egg, and milk, respectively. Moreover, all such models had sensitivity and specificity values 89%. Model interpretation via SHapley Additive exPlanations (SHAP) indicates that specific IgE, along with wheal and flare values from SPTs, are highly predictive of OFC outcomes. The results of this analysis suggest that ensemble learning has the potential to predict OFC outcomes and reveal relevant clinical factors for further study.

LGJan 10, 2023
Tensor Denoising via Amplification and Stable Rank Methods

Jonathan Gryak, Kayvan Najarian, Harm Derksen

Tensors in the form of multilinear arrays are ubiquitous in data science applications. Captured real-world data, including video, hyperspectral images, and discretized physical systems, naturally occur as tensors and often come with attendant noise. Under the additive noise model and with the assumption that the underlying clean tensor has low rank, many denoising methods have been created that utilize tensor decomposition to effect denoising through low rank tensor approximation. However, all such decomposition methods require estimating the tensor rank, or related measures such as the tensor spectral and nuclear norms, all of which are NP-hard problems. In this work we leverage our previously developed framework of $\textit{tensor amplification}$, which provides good approximations of the spectral and nuclear tensor norms, to denoising synthetic tensors of various sizes, ranks, and noise levels, along with real-world tensors derived from physiological signals. We also introduce two new notions of tensor rank -- $\textit{stable slice rank}$ and $\textit{stable }$$X$$\textit{-rank}$ -- and new denoising methods based on their estimation. The experimental results show that in the low rank context, tensor-based amplification provides comparable denoising performance in high signal-to-noise ratio (SNR) settings and superior performance in noisy (i.e., low SNR) settings, while the stable $X$-rank method achieves superior denoising performance on the physiological signal data.

39.3LGMar 21
LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models

Amirmohammad Ziaei Bideh, Jonathan Gryak

Discovering the governing equations of dynamical systems is a central problem across many scientific disciplines. As experimental data become increasingly available, automated equation discovery methods offer a promising data-driven approach to accelerate scientific discovery. Among these methods, genetic programming (GP) has been widely adopted due to its flexibility and interpretability. However, GP-based approaches often suffer from inefficient exploration of the symbolic search space, leading to slow convergence and suboptimal solutions. To address these limitations, we propose LLM-ODE, a large language model-aided model discovery framework that guides symbolic evolution using patterns extracted from elite candidate equations. By leveraging the generative prior of large language models, LLM-ODE produces more informed search trajectories while preserving the exploratory strengths of evolutionary algorithms. Empirical results on 91 dynamical systems show that LLM-ODE variants consistently outperform classical GP methods in terms of search efficiency and Pareto-front quality. Overall, our results demonstrate that LLM-ODE improves both efficiency and accuracy over traditional GP-based discovery and offers greater scalability to higher-dimensional systems compared to linear and Transformer-only model discovery methods.

LGSep 24, 2025Code
MDBench: Benchmarking Data-Driven Methods for Model Discovery

Amirmohammad Ziaei Bideh, Aleksandra Georgievska, Jonathan Gryak

Model discovery aims to uncover governing differential equations of dynamical systems directly from experimental data. Benchmarking such methods is essential for tracking progress and understanding trade-offs in the field. While prior efforts have focused mostly on identifying single equations, typically framed as symbolic regression, there remains a lack of comprehensive benchmarks for discovering dynamical models. To address this, we introduce MDBench, an open-source benchmarking framework for evaluating model discovery methods on dynamical systems. MDBench assesses 12 algorithms on 14 partial differential equations (PDEs) and 63 ordinary differential equations (ODEs) under varying levels of noise. Evaluation metrics include derivative prediction accuracy, model complexity, and equation fidelity. We also introduce seven challenging PDE systems from fluid dynamics and thermodynamics, revealing key limitations in current methods. Our findings illustrate that linear methods and genetic programming methods achieve the lowest prediction error for PDEs and ODEs, respectively. Moreover, linear models are in general more robust against noise. MDBench accelerates the advancement of model discovery methods by offering a rigorous, extensible benchmarking framework and a rich, diverse collection of dynamical system datasets, enabling systematic evaluation, comparison, and improvement of equation accuracy and robustness.

LGDec 9, 2021
A Novel Tropical Geometry-based Interpretable Machine Learning Method: Application in Prognosis of Advanced Heart Failure

Heming Yao, Harm Derksen, Jessica R. Golbus et al.

A model's interpretability is essential to many practical applications such as clinical decision support systems. In this paper, a novel interpretable machine learning method is presented, which can model the relationship between input variables and responses in humanly understandable rules. The method is built by applying tropical geometry to fuzzy inference systems, wherein variable encoding functions and salient rules can be discovered by supervised learning. Experiments using synthetic datasets were conducted to investigate the performance and capacity of the proposed algorithm in classification and rule discovery. Furthermore, the proposed method was applied to a clinical application that identified heart failure patients that would benefit from advanced therapies such as heart transplant or durable mechanical circulatory support. Experimental results show that the proposed network achieved great performance on the classification tasks. In addition to learning humanly understandable rules from the dataset, existing fuzzy domain knowledge can be easily transferred into the network and used to facilitate model training. From our results, the proposed model and the ability of learning existing domain knowledge can significantly improve the model generalizability. The characteristics of the proposed network make it promising in applications requiring model reliability and justification.

CVDec 3, 2020
Motion-based Camera Localization System in Colonoscopy Videos

Heming Yao, Ryan W. Stidham, Zijun Gao et al.

Optical colonoscopy is an essential diagnostic and prognostic tool for many gastrointestinal diseases, including cancer screening and staging, intestinal bleeding, diarrhea, abdominal symptom evaluation, and inflammatory bowel disease assessment. Automated assessment of colonoscopy is of interest considering the subjectivity present in qualitative human interpretations of colonoscopy findings. Localization of the camera is essential to interpreting the meaning and context of findings for diseases evaluated by colonoscopy. In this study, we propose a camera localization system to estimate the relative location of the camera and classify the colon into anatomical segments. The camera localization system begins with non-informative frame detection and removal. Then a self-training end-to-end convolutional neural network is built to estimate the camera motion, where several strategies are proposed to improve its robustness and generalization on endoscopic videos. Using the estimated camera motion a camera trajectory can be derived and a relative location index calculated. Based on the estimated location index, anatomical colon segment classification is performed by constructing a colon template. The proposed motion estimation algorithm was evaluated on an external dataset containing the ground truth for camera pose. The experimental results show that the performance of the proposed method is superior to other published methods. The relative location index estimation and anatomical region classification were further validated using colonoscopy videos collected from routine clinical practice. This validation yielded an average accuracy in classification of 0.754, which is substantially higher than the performances obtained using location indices built from other methods.

LGApr 17, 2019
An Unsupervised Feature Learning Approach to Reduce False Alarm Rate in ICUs

Behzad Ghazanfari, Fatemeh Afghah, Kayvan Najarian et al.

The high rate of false alarms in intensive care units (ICUs) is one of the top challenges of using medical technology in hospitals. These false alarms are often caused by patients' movements, detachment of monitoring sensors, or different sources of noise and interference that impact the collected signals from different monitoring devices. In this paper, we propose a novel set of high-level features based on unsupervised feature learning technique in order to effectively capture the characteristics of different arrhythmia in electrocardiogram (ECG) signal and differentiate them from irregularity in signals due to different sources of signal disturbances. This unsupervised feature learning technique, first extracts a set of low-level features from all existing heart cycles of a patient, and then clusters these segments for each individual patient to provide a set of prominent high-level features. The objective of the clustering phase is to enable the classification method to differentiate between the high-level features extracted from normal and abnormal cycles (i.e., either due to arrhythmia or different sources of distortions in signal) in order to put more attention to the features extracted from abnormal portion of the signal that contribute to the alarm. The performance of this method is evaluated using the 2015 PhysioNet/Computing in Cardiology Challenge dataset for reducing false arrhythmia alarms in the ICUs. As confirmed by the experimental results, the proposed method offers a considerable performance in terms of accuracy, sensitivity and specificity of alarm detection only using a few high-level features that are extracted from one single lead ECG signal.

GRMay 30, 2017
Solving the Conjugacy Decision Problem via Machine Learning

Jonathan Gryak, Robert M. Haralick, Delaram Kahrobaei

Machine learning and pattern recognition techniques have been successfully applied to algorithmic problems in free groups. In this paper, we seek to extend these techniques to finitely presented non-free groups, with a particular emphasis on polycyclic and metabelian groups that are of interest to non-commutative cryptography. As a prototypical example, we utilize supervised learning methods to construct classifiers that can solve the conjugacy decision problem, i.e., determine whether or not a pair of elements from a specified group are conjugate. The accuracies of classifiers created using decision trees, random forests, and N-tuple neural network models are evaluated for several non-free groups. The very high accuracy of these classifiers suggests an underlying mathematical relationship with respect to conjugacy in the tested groups.

CRJul 20, 2016
The Status of Polycyclic Group-Based Cryptography: A Survey and Open Problems

Jonathan Gryak, Delaram Kahrobaei

Polycyclic groups are natural generalizations of cyclic groups but with more complicated algorithmic properties. They are finitely presented and the word, conjugacy, and isomorphism decision problems are all solvable in these groups. Moreover, the non-virtually nilpotent ones exhibit an exponential growth rate. These properties make them suitable for use in group-based cryptography, which was proposed in 2004 by Eick and Kahrobaei. Since then, many cryptosystems have been created that employ polycyclic groups. These include key exchanges such as non-commutative ElGamal, authentication schemes based on the twisted conjugacy problem, and secret sharing via the word problem. In response, heuristic and deterministic methods of cryptanalysis have been developed, including the length-based and linear decomposition attacks. Despite these efforts, there are classes of infinite polycyclic groups that remain suitable for cryptography. The analysis of algorithms for search and decision problems in polycyclic groups has also been developed. In addition to results for the aforementioned problems we present those concerning polycyclic representations, group morphisms, and orbit decidability. Though much progress has been made, many algorithmic and complexity problems remain unsolved, we conclude with a number of them. Of particular interest is to show that cryptosystems using infinite polycyclic groups are resistant to cryptanalysis on a quantum computer.