Luis M. Rocha

SI
h-index10
12papers
361citations
Novelty36%
AI Score40

12 Papers

31.6CLMay 28Code
Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

Ahmed Abdeen Hamed, Luis M. Rocha

We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.

MNApr 18, 2016
Control of complex networks requires both structure and dynamics

Alexander J. Gates, Luis M. Rocha

The study of network structure has uncovered signatures of the organization of complex systems. However, there is also a need to understand how to control them; for example, identifying strategies to revert a diseased cell to a healthy state, or a mature cell to a pluripotent state. Two recent methodologies suggest that the controllability of complex systems can be predicted solely from the graph of interactions between variables, without considering their dynamics: structural controllability and minimum dominating sets. We demonstrate that such structure-only methods fail to characterize controllability when dynamics are introduced. We study Boolean network ensembles of network motifs as well as three models of biochemical regulation: the segment polarity network in Drosophila melanogaster, the cell cycle of budding yeast Saccharomyces cerevisiae, and the floral organ arrangement in Arabidopsis thaliana. We demonstrate that structure-only methods both undershoot and overshoot the number and which sets of critical variables best control the dynamics of these models, highlighting the importance of the actual system dynamics in determining control. Our analysis further shows that the logic of automata transition functions, namely how canalizing they are, plays an important role in the extent to which structure predicts dynamics.

AOJan 16, 2015
Prediction and Modularity in Dynamical Systems

Artemy Kolchinsky, Luis M. Rocha

Identifying and understanding modular organizations is centrally important in the study of complex systems. Several approaches to this problem have been advanced, many framed in information-theoretic terms. Our treatment starts from the complementary point of view of statistical modeling and prediction of dynamical systems. It is known that for finite amounts of training data, simpler models can have greater predictive power than more complex ones. We use the trade-off between model simplicity and predictive accuracy to generate optimal multiscale decompositions of dynamical networks into weakly-coupled, simple modules. State-dependent and causal versions of our method are also proposed.

OHMay 9, 2018
CANA: A python package for quantifying control and canalization in Boolean Networks

Rion Brattig Correia, Alexander J. Gates, Xuan Wang et al.

Logical models offer a simple but powerful means to understand the complex dynamics of biochemical regulation, without the need to estimate kinetic parameters. However, even simple automata components can lead to collective dynamics that are computationally intractable when aggregated into networks. In previous work we demonstrated that automata network models of biochemical regulation are highly canalizing, whereby many variable states and their groupings are redundant (Marques-Pita and Rocha, 2013). The precise charting and measurement of such canalization simplifies these models, making even very large networks amenable to analysis. Moreover, canalization plays an important role in the control, robustness, modularity and criticality of Boolean network dynamics, especially those used to model biochemical regulation (Gates and Rocha, 2016; Gates et al., 2016; Manicka, 2017). Here we describe a new publicly-available Python package that provides the necessary tools to extract, measure, and visualize canalizing redundancy present in Boolean network models. It extracts the pathways most effective in controlling dynamics in these models, including their effective graph and dynamics canalizing map, as well as other tools to uncover minimum sets of control variables.

CLMay 14, 2024
Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

Aehong Min, Xuan Wang, Rion Brattig Correia et al.

We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.

SIMar 8, 2021
The distance backbone of complex networks

Tiago Simas, Rion Brattig Correia, Luis M. Rocha

Redundancy needs more precise characterization as it is a major factor in the evolution and robustness of networks of multivariate interactions. We investigate the complexity of such interactions by inferring a connection transitivity that includes all possible measures of path length for weighted graphs. The result, without breaking the graph into smaller components, is a distance backbone subgraph sufficient to compute all shortest paths. This is important for understanding the dynamics of spread and communication phenomena in real-world networks. The general methodology we formally derive yields a principled graph reduction technique and provides a finer characterization of the triangular geometry of all edges -- those that contribute to shortest paths and those that do not but are involved in other network phenomena. We demonstrate that the distance backbone is very small in large networks across domains ranging from air traffic to the human brain connectome, revealing that network robustness to attacks and failures seems to stem from surprisingly vast amounts of redundancy.

SIMar 9, 2018
City-wide Analysis of Electronic Health Records Reveals Gender and Age Biases in the Administration of Known Drug-Drug Interactions

Rion Brattig Correia, Luciana P. de Araújo, Mauro M. Mattos et al.

The occurrence of drug-drug-interactions (DDI) from multiple drug dispensations is a serious problem, both for individuals and health-care systems, since patients with complications due to DDI are likely to reenter the system at a costlier level. We present a large-scale longitudinal study (18 months) of the DDI phenomenon at the primary- and secondary-care level using electronic health records (EHR) from the city of Blumenau in Southern Brazil (pop. $\approx 340,000$). We found that 181 distinct drug pairs known to interact were dispensed concomitantly to 12\% of the patients in the city's public health-care system. Further, 4\% of the patients were dispensed drug pairs that are likely to result in major adverse drug reactions (ADR)---with costs estimated to be much larger than previously reported in smaller studies. The large-scale analysis reveals that women have a 60\% increased risk of DDI as compared to men; the increase becomes 90\% when considering only DDI known to lead to major ADR. Furthermore, DDI risk increases substantially with age; patients aged 70-79 years have a 34\% risk of DDI when they are dispensed two or more drugs concomitantly. Interestingly, a statistical null model demonstrates that age- and female-specific risks from increased polypharmacy fail by far to explain the observed DDI risks in those populations, suggesting unknown social or biological causes. We also provide a network visualization of drugs and demographic factors that characterize the DDI phenomenon and demonstrate that accurate DDI prediction can be included in healthcare and public-health management, to reduce DDI-related ADR and costs.

SIOct 5, 2015
Monitoring Potential Drug Interactions and Reactions via Network Analysis of Instagram User Timelines

Rion Brattig Correia, Lang Li, Luis M. Rocha

Much recent research aims to identify evidence for Drug-Drug Interactions (DDI) and Adverse Drug reactions (ADR) from the biomedical scientific literature. In addition to this "Bibliome", the universe of social media provides a very promising source of large-scale data that can help identify DDI and ADR in ways that have not been hitherto possible. Given the large number of users, analysis of social media data may be useful to identify under-reported, population-level pathology associated with DDI, thus further contributing to improvements in population health. Moreover, tapping into this data allows us to infer drug interactions with natural products--including cannabis--which constitute an array of DDI very poorly explored by biomedical research thus far. Our goal is to determine the potential of Instagram for public health monitoring and surveillance for DDI, ADR, and behavioral pathology at large. Using drug, symptom, and natural product dictionaries for identification of the various types of DDI and ADR evidence, we have collected ~7000 timelines. We report on 1) the development of a monitoring tool to easily observe user-level timelines associated with drug and symptom terms of interest, and 2) population-level behavior via the analysis of co-occurrence networks computed from user timelines at three different scales: monthly, weekly, and daily occurrences. Analysis of these networks further reveals 3) drug and symptom direct and indirect associations with greater support in user timelines, as well as 4) clusters of symptoms and drugs revealed by the collective behavior of the observed population. This demonstrates that Instagram contains much drug- and pathology specific data for public health monitoring of DDI and ADR, and that complex network analysis provides an important toolbox to extract health-related associations and their support from large-scale social media data.

MLDec 2, 2014
Extraction of Pharmacokinetic Evidence of Drug-drug Interactions from the Literature

Artemy Kolchinsky, Anália Lourenço, Heng-Yi Wu et al.

Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmaco-epidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1~=0.93, MCC~=0.74, iAUC~=0.99) and sentences (F1~=0.76, MCC~=0.65, iAUC~=0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. ...

ROJun 26, 2014
Designing a minimalist socially aware robotic agent for the home

Matthew R. Francisco, Ian Wood, Selma Šabanović et al.

We present a minimalist social robot that relies on long timeseries of low resolution data such as mechanical vibration, temperature, lighting, sounds and collisions. Our goal is to develop an experimental system for growing socially situated robotic agents whose behavioral repertoire is subsumed by the social order of the space. To get there we are designing robots that use their simple sensors and motion feedback routines to recognize different classes of human activity and then associate to each class a range of appropriate behaviors. We use the Katie Family of robots, built on the iRobot Create platform, an Arduino Uno, and a Raspberry Pi. We describe its sensor abilities and exploratory tests that allow us to develop hypotheses about what objects (sensor data) correspond to something known and observable by a human subject. We use machine learning methods to classify three social scenarios from over a hundred experiments, demonstrating that it is possible to detect social situations with high accuracy, using the low-resolution sensors from our minimalist robot.

MLOct 2, 2012
Evaluation of linear classifiers on articles containing pharmacokinetic evidence of drug-drug interactions

Artemy Kolchinsky, Anália Lourenço, Lang Li et al.

Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. [...] Biomedical literature mining can aid DDI research by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit from literature mining is the automatic identification of a large number of potential DDIs, whose pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We implemented a set of classifiers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts, under different feature transformation and dimensionality reduction methods. In addition, we investigate the performance benefits of including various publicly-available named entity recognition features, as well as a set of internally-developed pharmacokinetic dictionaries. Results. We found that several classifiers performed well in distinguishing relevant and irrelevant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classification. For some classifiers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classification.

IRSep 8, 2012
Semi-metric networks for recommender systems

Tiago Simas, Luis M. Rocha

Weighted graphs obtained from co-occurrence in user-item relations lead to non-metric topologies. We use this semi-metric behavior to issue recommendations, and discuss its relationship to transitive closure on fuzzy graphs. Finally, we test the performance of this method against other item- and user-based recommender systems on the Movielens benchmark. We show that including highly semi-metric edges in our recommendation algorithms leads to better recommendations.