IVNov 27, 2023Code
Automated Measurement of Vascular Calcification in Femoral Endarterectomy Patients Using Deep LearningAlireza Bagheri Rajeoni, Breanna Pederson, Daniel G. Clair et al.
Atherosclerosis, a chronic inflammatory disease affecting the large arteries, presents a global health risk. Accurate analysis of diagnostic images, like computed tomographic angiograms (CTAs), is essential for staging and monitoring the progression of atherosclerosis-related conditions, including peripheral arterial disease (PAD). However, manual analysis of CTA images is time-consuming and tedious. To address this limitation, we employed a deep learning model to segment the vascular system in CTA images of PAD patients undergoing femoral endarterectomy surgery and to measure vascular calcification from the left renal artery to the patella. Utilizing proprietary CTA images of 27 patients undergoing femoral endarterectomy surgery provided by Prisma Health Midlands, we developed a Deep Neural Network (DNN) model to first segment the arterial system, starting from the descending aorta to the patella, and second, to provide a metric of arterial calcification. Our designed DNN achieved 83.4% average Dice accuracy in segmenting arteries from aorta to patella, advancing the state-of-the-art by 0.8%. Furthermore, our work is the first to present a robust statistical analysis of automated calcification measurement in the lower extremities using deep learning, attaining a Mean Absolute Percentage Error (MAPE) of 9.5% and a correlation coefficient of 0.978 between automated and manual calcification scores. These findings underscore the potential of deep learning techniques as a rapid and accurate tool for medical professionals to assess calcification in the abdominal aorta and its branches above the patella. The developed DNN model and related documentation in this project are available at GitHub page at https://github.com/pip-alireza/DeepCalcScoring.
IVNov 17, 2023Code
TransONet: Automatic Segmentation of Vasculature in Computed Tomographic Angiograms Using Deep LearningAlireza Bagheri Rajeoni, Breanna Pederson, Ali Firooz et al.
Pathological alterations in the human vascular system underlie many chronic diseases, such as atherosclerosis and aneurysms. However, manually analyzing diagnostic images of the vascular system, such as computed tomographic angiograms (CTAs) is a time-consuming and tedious process. To address this issue, we propose a deep learning model to segment the vascular system in CTA images of patients undergoing surgery for peripheral arterial disease (PAD). Our study focused on accurately segmenting the vascular system (1) from the descending thoracic aorta to the iliac bifurcation and (2) from the descending thoracic aorta to the knees in CTA images using deep learning techniques. Our approach achieved average Dice accuracies of 93.5% and 80.64% in test dataset for (1) and (2), respectively, highlighting its high accuracy and potential clinical utility. These findings demonstrate the use of deep learning techniques as a valuable tool for medical professionals to analyze the health of the vascular system efficiently and accurately. Please visit the GitHub page for this paper at https://github.com/pip-alireza/TransOnet.
SPJun 3, 2022
Human Activity Recognition on Time Series Accelerometer Sensor Data using LSTM Recurrent Neural NetworksChrisogonas O. Odhiambo, Sanjoy Saha, Corby K. Martin et al.
The use of sensors available through smart devices has pervaded everyday life in several applications including human activity monitoring, healthcare, and social networks. In this study, we focus on the use of smartwatch accelerometer sensors to recognize eating activity. More specifically, we collected sensor data from 10 participants while consuming pizza. Using this information, and other comparable data available for similar events such as smoking and medication-taking, and dissimilar activities of jogging, we developed a LSTM-ANN architecture that has demonstrated 90% success in identifying individual bites compared to a puff, medication-taking or jogging activities.
OHAug 29, 2023
Wordification: A New Way of Teaching English Spelling PatternsLexington Whalen, Nathan Bickel, Shash Comandur et al.
Literacy, or the ability to read and write, is a crucial indicator of success in life and greater society. It is estimated that 85% of people in juvenile delinquent systems cannot adequately read or write, that more than half of those with substance abuse issues have complications in reading or writing and that two-thirds of those who do not complete high school lack proper literacy skills. Furthermore, young children who do not possess reading skills matching grade level by the fourth grade are approximately 80% likely to not catch up at all. Many may believe that in a developed country such as the United States, literacy fails to be an issue; however, this is a dangerous misunderstanding. Globally an estimated 1.19 trillion dollars are lost every year due to issues in literacy; in the USA, the loss is an estimated 300 billion. To put it in more shocking terms, one in five American adults still fail to comprehend basic sentences. Making matters worse, the only tools available now to correct a lack of reading and writing ability are found in expensive tutoring or other programs that oftentimes fail to be able to reach the required audience. In this paper, our team puts forward a new way of teaching English spelling and word recognitions to grade school students in the United States: Wordification. Wordification is a web application designed to teach English literacy using principles of linguistics applied to the orthographic and phonological properties of words in a manner not fully utilized previously in any computer-based teaching application.
CVApr 5, 2023
nD-PDPA: nDimensional Probability Density Profile AnalysisArjang Fahim, Stephanie Irausquin, Homayoun Valafar
Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the applicability of NMR spectroscopy by addressing temporal and economical problems more efficiently. While these methods demonstrate successful outcomes to solve more challenging and structurally novel proteins, the cost has not been reduced significantly. Probability Density Profile Analysis (PDPA) has been previously introduced by our lab to directly address the economics of structure determination of routine proteins and the identification of novel structures from a minimal set of unassigned NMR data. 2D-PDPA (in which 2D denotes incorporation of data from two alignment media) has been successful in identifying the structural homolog of an unknown protein within a library of ~1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, the incorporation of additional data was necessary. However, the expansion of the original PDPA approach was limited by its computational requirements where the inclusion of additional data would render it computationally intractable. Here we present the most recent developments of PDPA method (nD-PDPA: n Dimensional Probability Density Profile Analysis) that eliminate 2D-PDPA's computational limitations, and allows inclusion of RDC data from multiple vector types in multiple alignment media.
NEApr 1, 2022
Application of Dimensional Reduction in Artificial Neural Networks to Improve Emergency Department Triage During Chemical Mass Casualty IncidentsNicholas D. Boltin, Joan M. Culley, Homayoun Valafar
Chemical Mass Casualty Incidents (MCI) place a heavy burden on hospital staff and resources. Machine Learning (ML) tools can provide efficient decision support to caregivers. However, ML models require large volumes of data for the most accurate results, which is typically not feasible in the chaotic nature of a chemical MCI. This study examines the application of four statistical dimension reduction techniques: Random Selection, Covariance/Variance, Pearson's Linear Correlation, and Principle Component Analysis to reduce a dataset of 311 hazardous chemicals and 79 related signs and symptoms (SSx). An Artificial Neural Network pipeline was developed to create comparative models. Results show that the number of signs and symptoms needed to determine a chemical culprit can be reduced to nearly 40 SSx without losing significant model accuracy. Evidence also suggests that the application of dimension reduction methods can improve ANN model performance accuracy.
LGDec 5, 2025
Rep Smarter, Not Harder: AI Hypertrophy Coaching with Wearable Sensors and Edge Neural NetworksGrant King, Musa Azeem, Savannah Noblitt et al.
Optimizing resistance training for hypertrophy requires balancing proximity to muscular failure, often quantified by Repetitions in Reserve (RiR), with fatigue management. However, subjective RiR assessment is unreliable, leading to suboptimal training stimuli or excessive fatigue. This paper introduces a novel system for real-time feedback on near-failure states (RiR $\le$ 2) during resistance exercise using only a single wrist-mounted Inertial Measurement Unit (IMU). We propose a two-stage pipeline suitable for edge deployment: first, a ResNet-based model segments repetitions from the 6-axis IMU data in real-time. Second, features derived from this segmentation, alongside direct convolutional features and historical context captured by an LSTM, are used by a classification model to identify exercise windows corresponding to near-failure states. Using a newly collected dataset from 13 diverse participants performing preacher curls to failure (631 total reps), our segmentation model achieved an F1 score of 0.83, and the near-failure classifier achieved an F1 score of 0.82 under simulated real-time evaluation conditions (1.6 Hz inference rate). Deployment on a Raspberry Pi 5 yielded an average inference latency of 112 ms, and on an iPhone 16 yielded 23.5 ms, confirming the feasibility for edge computation. This work demonstrates a practical approach for objective, real-time training intensity feedback using minimal hardware, paving the way for accessible AI-driven hypertrophy coaching tools that help users manage intensity and fatigue effectively.
SPJul 8, 2025
Automated Vigilance State Classification in Rodents Using Machine Learning and Feature EngineeringSankalp Jajee, Gaurav Kumar, Homayoun Valafar
Preclinical sleep research remains constrained by labor intensive, manual vigilance state classification and inter rater variability, limiting throughput and reproducibility. This study presents an automated framework developed by Team Neural Prognosticators to classify electroencephalogram (EEG) recordings of small rodents into three critical vigilance states paradoxical sleep (REM), slow wave sleep (SWS), and wakefulness. The system integrates advanced signal processing with machine learning, leveraging engineered features from both time and frequency domains, including spectral power across canonical EEG bands (delta to gamma), temporal dynamics via Maximum-Minimum Distance, and cross-frequency coupling metrics. These features capture distinct neurophysiological signatures such as high frequency desynchronization during wakefulness, delta oscillations in SWS, and REM specific bursts. Validated during the 2024 Big Data Health Science Case Competition (University of South Carolina Big Data Health Science Center, 2024), our XGBoost model achieved 91.5% overall accuracy, 86.8% precision, 81.2% recall, and an F1 score of 83.5%, outperforming all baseline methods. Our approach represents a critical advancement in automated sleep state classification and a valuable tool for accelerating discoveries in sleep science and the development of targeted interventions for chronic sleep disorders. As a publicly available code (BDHSC) resource is set to contribute significantly to advancements.
LGNov 4, 2021
Application of Machine Learning to Sleep Stage ClassificationAndrew Smith, Hardik Anand, Snezana Milosavljevic et al.
Sleep studies are imperative to recapitulate phenotypes associated with sleep loss and uncover mechanisms contributing to psychopathology. Most often, investigators manually classify the polysomnography into vigilance states, which is time-consuming, requires extensive training, and is prone to inter-scorer variability. While many works have successfully developed automated vigilance state classifiers based on multiple EEG channels, we aim to produce an automated and open-access classifier that can reliably predict vigilance state based on a single cortical electroencephalogram (EEG) from rodents to minimize the disadvantages that accompany tethering small animals via wires to computer programs. Approximately 427 hours of continuously monitored EEG, electromyogram (EMG), and activity were labeled by a domain expert out of 571 hours of total data. Here we evaluate the performance of various machine learning techniques on classifying 10-second epochs into one of three discrete classes: paradoxical, slow-wave, or wake. Our investigations include Decision Trees, Random Forests, Naive Bayes Classifiers, Logistic Regression Classifiers, and Artificial Neural Networks. These methodologies have achieved accuracies ranging from approximately 74% to approximately 96%. Most notably, the Random Forest and the ANN achieved remarkable accuracies of 95.78% and 93.31%, respectively. Here we have shown the potential of various machine learning classifiers to automatically, accurately, and reliably classify vigilance states based on a single EEG reading and a single EMG reading.
LGSep 13, 2021
Application of Machine Learning in Early Recommendation of Cardiac Resynchronization TherapyBrendan E. Odigwe, Francis G. Spinale, Homayoun Valafar
Heart failure (HF) is a leading cause of morbidity, mortality, and health care costs. Prolonged conduction through the myocardium can occur with HF, and a device-driven approach, termed cardiac resynchronization therapy (CRT), can improve left ventricular (LV) myocardial conduction patterns. While a functional benefit of CRT has been demonstrated, a large proportion of HF patients (30-50%) receiving CRT do not show sufficient improvement. Moreover, identifying HF patients that would benefit from CRT prospectively remains a clinical challenge. Accordingly, strategies to effectively predict those HF patients that would derive a functional benefit from CRT holds great medical and socio-economic importance. Thus, we used machine learning methods of classifying HF patients, namely Cluster Analysis, Decision Trees, and Artificial neural networks, to develop predictive models of individual outcomes following CRT. Clinical, functional, and biomarker data were collected in HF patients before and following CRT. A prospective 6-month endpoint of a reduction in LV volume was defined as a CRT response. Using this approach (418 responders, 412 non-responders), each with 56 parameters, we could classify HF patients based on their response to CRT with more than 95% success. We have demonstrated that using machine learning approaches can identify HF patients with a high probability of a positive CRT response (95% accuracy), and of equal importance, identify those HF patients that would not derive a functional benefit from CRT. Developing this approach into a clinical algorithm to assist in clinical decision-making regarding the use of CRT in HF patients would potentially improve outcomes and reduce health care costs.
AIMay 19, 2021
MedSensor: Medication Adherence Monitoring Using Neural Networks on Smartwatch Accelerometer Sensor DataChrisogonas Odhiambo, Pamela Wright, Cindy Corbett et al.
Poor medication adherence presents serious economic and health problems including compromised treatment effectiveness, medical complications, and loss of billions of dollars in wasted medicine or procedures. Though various interventions have been proposed to address this problem, there is an urgent need to leverage light, smart, and minimally obtrusive technology such as smartwatches to develop user tools to improve medication use and adherence. In this study, we conducted several experiments on medication-taking activities, developed a smartwatch android application to collect the accelerometer hand gesture data from the smartwatch, and conveyed the data collected to a central cloud database. We developed neural networks, then trained the networks on the sensor data to recognize medication and non-medication gestures. With the proposed machine learning algorithm approach, this study was able to achieve average accuracy scores of 97% on the protocol-guided gesture data, and 95% on natural gesture data.
SPDec 16, 2020
Reduction in the complexity of 1D 1H-NMR spectra by the use of Frequency to Information TransformationHomayoun Valafar, Faramarz Valafar
Analysis of 1H-NMR spectra is often hindered by large variations that occur during the collection of these spectra. Large solvent and standard peaks, base line drift and negative peaks (due to improper phasing) are among some of these variations. Furthermore, some instrument dependent alterations, such as incorrect shimming, are also embedded in the recorded spectrum. The unpredictable nature of these alterations of the signal has rendered the automated and instrument independent computer analysis of these spectra unreliable. In this paper, a novel method of extracting the information content of a signal (in this paper, frequency domain 1H-NMR spectrum), called the frequency-information transformation (FIT), is presented and compared to a previously used method (SPUTNIK). FIT can successfully extract the relevant information to a pattern matching task present in a signal, while discarding the remainder of a signal by transforming a Fourier transformed signal into an information spectrum (IS). This technique exhibits the ability of decreasing the inter-class correlation coefficients while increasing the intra-class correlation coefficients. Different spectra of the same molecule, in other words, will resemble more to each other while the spectra of different molecules will look more different from each other. This feature allows easier automated identification and analysis of molecules based on their spectral signatures using computer algorithms.
BMDec 12, 2020
TALI: Protein Structure Alignment Using Backbone Torsion AnglesXijiang Miao, Michael G. Bryson, Homayoun Valafar
This article introduces a novel protein structure alignment method (named TALI) based on the protein backbone torsion angle instead of the more traditional distance matrix. Because the structural alignment of the two proteins is based on the comparison of two sequences of numbers (backbone torsion angles), we can take advantage of a large number of well-developed methods such as Smith-Waterman or Needleman-Wunsch. Here we report the result of TALI in comparison to other structure alignment methods such as DALI, CE, and SSM ass well as sequence alignment based on PSI-BLAST. TALI demonstrated great success over all other methods in application to challenging proteins. TALI was more successful in recognizing remote structural homology. TALI also demonstrated an ability to identify structural homology between two proteins where the structural difference was due to a rotation of internal domains by nearly 180$^\circ$.
LGAug 2, 2020
An Investigation in Optimal Encoding of Protein Primary Sequence for Structure Prediction by Artificial Neural NetworksAaron Hein, Casey Cole, Homayoun Valafar
Machine learning and the use of neural networks has increased precipitously over the past few years primarily due to the ever-increasing accessibility to data and the growth of computation power. It has become increasingly easy to harness the power of machine learning for predictive tasks. Protein structure prediction is one area where neural networks are becoming increasingly popular and successful. Although very powerful, the use of ANN require selection of most appropriate input/output encoding, architecture, and class to produce the optimal results. In this investigation we have explored and evaluated the effect of several conventional and newly proposed input encodings and selected an optimal architecture. We considered 11 variations of input encoding, 11 alternative window sizes, and 7 different architectures. In total, we evaluated 2,541 permutations in application to the training and testing of more than 10,000 protein structures over the course of 3 months. Our investigations concluded that one-hot encoding, the use of LSTMs, and window sizes of 9, 11, and 15 produce the optimal outcome. Through this optimization, we were able to improve the quality of protein structure prediction by predicting the φ dihedrals to within 14° - 16° and ψ dihedrals to within 23°- 25°. This is a notable improvement compared to previously similar investigations.
SPJul 31, 2020
A Comparative study of Artificial Neural Networks Using Reinforcement learning and Multidimensional Bayesian Classification Using Parzen Density Estimation for Identification of GC-EIMS Spectra of Partially Methylated Alditol AcetatesFaramarz Valafar, Homayoun Valafar
This study reports the development of a pattern recognition search engine for a World Wide Web-based database of gas chromatography-electron impact mass spectra (GC-EIMS) of partially methylated Alditol Acetates (PMAAs). Here, we also report comparative results for two pattern recognition techniques that were employed for this study. The first technique is a statistical technique using Bayesian classifiers and Parzen density estimators. The second technique involves an artificial neural network module trained with reinforcement learning. We demonstrate here that both systems perform well in identifying spectra with small amounts of noise. Both system's performance degrades with degrading signal-to-noise ratio (SNR). When dealing with partial spectra (missing data), the artificial neural network system performs better. The developed system is implemented on the world wide web, and is intended to identify PMAAs using submitted spectra of these molecules recorded on any GC-EIMS instrument. The system, therefore, is insensitive to instrument and column dependent variations in GC-EIMS spectra.
NEJul 30, 2020
Parallel, Self Organizing, Consensus Neural NetworksHomayoun Valafar, Faramarz Valafar, Okan Ersoy
A new neural network architecture (PSCNN) is developed to improve performance and speed of such networks. The architecture has all the advantages of the previous models such as self-organization and possesses some other superior characteristics such as input parallelism and decision making based on consensus. Due to the properties of this network, it was studied with respect to implementation on a Parallel Processor (Ncube Machine) as well as a regular sequential machine. The architecture self organizes its own modules in a way to maximize performance. Since it is completely parallel, both recall and learning procedures are very fast. The performance of the network was compared to the Backpropagation networks in problems of language perception, remote sensing and binary logic (Exclusive-Or). PSCNN showed superior performance in all cases studied.
BMJul 30, 2020
Identification of 1H-NMR Spectra of Xyloglucan Oligosaccharides: A Comparative Study of Artificial Neural Networks and Bayesian Classification Using Nonparametric Density EstimationFaramarz Valafar, Homayoun Valafar, William S. York
Proton nuclear magnetic resonance (1H-NMR) is a widely used tool for chemical structural analysis. However, 1H-NMR spectra suffer from natural aberrations that render computer-assisted automated identification of these spectra difficult, and at times impossible. Previous efforts have successfully implemented instrument dependent or conditional identification of these spectra. In this paper, we report the first instrument independent computer-assisted automated identification system for a group of complex carbohydrates known as the xyloglucan oligosaccharides. The developed system is also implemented on the world wide web (http://www.ccrc.uga.edu) as part of an identification package called the CCRC-Net and is intended to recognize any submitted 1H-NMR spectrum of these structures with reasonable signal-to-noise ratio, recorded on any 500 MHz NMR instrument. The system uses Artificial Neural Networks (ANNs) technology and is insensitive to the instrument and environment-dependent variations in 1H-NMR spectroscopy. In this paper, comparative results of the ANN engine versus a multidimensional Bayes' classifier is also presented.
LGMar 5, 2020
Recognition of Smoking Gesture Using Smart Watch TechnologyCasey A. Cole, Bethany Janos, Dien Anshari et al.
Diseases resulting from prolonged smoking are the most common preventable causes of death in the world today. In this report we investigate the success of utilizing accelerometer sensors in smart watches to identify smoking gestures. Early identification of smoking gestures can help to initiate the appropriate intervention method and prevent relapses in smoking. Our experiments indicate 85%-95% success rates in identification of smoking gesture among other similar gestures using Artificial Neural Networks (ANNs). Our investigations concluded that information obtained from the x-dimension of accelerometers is the best means of identifying the smoking gesture, while y and z dimensions are helpful in eliminating other gestures such as: eating, drinking, and scratch of nose. We utilized sensor data from the Apple Watch during the training of the ANN. Using sensor data from another participant collected on Pebble Steel, we obtained a smoking identification accuracy of greater than 90% when using an ANN trained on data previously collected from the Apple Watch. Finally, we have demonstrated the possibility of using smart watches to perform continuous monitoring of daily activities.
CVJan 7, 2020
State Transition Modeling of the Smoking Behavior using LSTM Recurrent Neural NetworksChrisogonas O. Odhiambo, Casey A. Cole, Alaleh Torkjazi et al.
The use of sensors has pervaded everyday life in several applications including human activity monitoring, healthcare, and social networks. In this study, we focus on the use of smartwatch sensors to recognize smoking activity. More specifically, we have reformulated the previous work in detection of smoking to include in-context recognition of smoking. Our presented reformulation of the smoking gesture as a state-transition model that consists of the mini-gestures hand-to-lip, hand-on-lip, and hand-off-lip, has demonstrated improvement in detection rates nearing 100% using conventional neural networks. In addition, we have begun the utilization of Long-Short-Term Memory (LSTM) neural networks to allow for in-context detection of gestures with accuracy nearing 97%.
MLDec 12, 2019
An AI model for Rapid and Accurate Identification of Chemical Agents in Mass Casualty IncidentsNicholas Boltin, Daniel Vu, Bethany Janos et al.
In this report we examine the effectiveness of WISER in identification of a chemical culprit during a chemical based Mass Casualty Incident (MCI). We also evaluate and compare Binary Decision Tree (BDT) and Artificial Neural Networks (ANN) using the same experimental conditions as WISER. The reverse engineered set of Signs/Symptoms from the WISER application was used as the training set and 31,100 simulated patient records were used as the testing set. Three sets of simulated patient records were generated by 5%, 10% and 15% perturbation of the Signs/Symptoms of each chemical record. While all three methods achieved a 100% training accuracy, WISER, BDT and ANN produced performances in the range of: 1.8%-0%, 65%-26%, 67%-21% respectively. A preliminary investigation of dimensional reduction using ANN illustrated a dimensional collapse from 79 variables to 40 with little loss of classification performance.
IVDec 12, 2019
Automated Analysis of Femoral Artery Calcification Using Machine Learning TechniquesLiang Zhao, Brendan Odigwe, Susan Lessner et al.
We report an object tracking algorithm that combines geometrical constraints, thresholding, and motion detection for tracking of the descending aorta and the network of major arteries that branch from the aorta including the iliac and femoral arteries. Using our automated identification and analysis, arterial system was identified with more than 85% success when compared to human annotation. Furthermore, the reported automated system is capable of producing a stenosis profile, and a calcification score similar to the Agatston score. The use of stenosis and calcification profiles will lead to the development of better-informed diagnostic and prognostic tools.
QMNov 25, 2019
Modelling of Sickle Cell Anemia Patients Response to Hydroxyurea using Artificial Neural NetworksBrendan E. Odigwe, Jesuloluwa S. Eyitayo, Celestine I. Odigwe et al.
Hydroxyurea (HU) has been shown to be effective in alleviating the symptoms of Sickle Cell Anemia disease. While Hydroxyurea reduces the complications associated with Sickle Cell Anemia in some patients, others do not benefit from this drug and experience deleterious effects since it is also a chemotherapeutic agent. Therefore, to whom, should the administration of HU be considered as a viable option, is the main question asked by the responsible physician. We address this question by developing modeling techniques that can predict a patient's response to HU and therefore spare the non-responsive patients from the unnecessary effects of HU on the values of 22 parameters that can be obtained from blood samples in 122 patients. Using this data, we developed Deep Artificial Neural Network models that can predict with 92.6% accuracy, the final HbF value of a subject after undergoing HU therapy. Our current studies are focussing on forecasting a patient's HbF response, 30 days ahead of time.
BMNov 6, 2019
Using Residual Dipolar Couplings from Two Alignment Media to Detect Structural HomologyRyan Yandle, Rishi Mukhopadhyay, Homayoun Valafar
The method of Probability Density Profile Analysis has been introduced previously as a tool to find the best match between a set of experimentally generated Residual Dipolar Couplings and a set of known protein structures. While it proved effective on small databases in identifying protein fold families, and for picking the best result from computational protein folding tool ROBETTA, for larger data sets, more data is required. Here, the method of 2-D Probability Density Profile Analysis is presented which incorporates paired RDC data from 2 alignment media for N-H vectors. The method was tested using synthetic RDC data generated with +/-1 Hz error. The results show that the addition of information from a second alignment medium makes 2-D PDPA a much more effective tool that is able to identify a structure from a database of 600 protein fold family representatives.
BMNov 1, 2019
Protein Fold Family Recognition From Unassigned Residual Dipolar Coupling DataRishi Mukhopadhyay, Paul Shealy, Homayoun Valafar
Despite many advances in computational modeling of protein structures, these methods have not been widely utilized by experimental structural biologists. Two major obstacles are preventing the transition from a purely-experimental to a purely-computational mode of protein structure determination. The first problem is that most computational methods need a large library of computed structures that span a large variety of protein fold families, while structural genomics initiatives have slowed in their ability to provide novel protein folds in recent years. The second problem is an unwillingness to trust computational models that have no experimental backing. In this paper we test a potential solution to these problems that we have called Probability Density Profile Analysis (PDPA) that utilizes unassigned residual dipolar coupling data that are relatively cheap to acquire from NMR experiments.