CLOct 17, 2022
Fine-tuned Sentiment Analysis of COVID-19 Vaccine-Related Social Media Data: Comparative StudyChad A Melton, Brianna M White, Robert L Davis et al.
This study investigated and compared public sentiment related to COVID-19 vaccines expressed on two popular social media platforms, Reddit and Twitter, harvested from January 1, 2020, to March 1, 2022. To accomplish this task, we created a fine-tuned DistilRoBERTa model to predict sentiments of approximately 9.5 million Tweets and 70 thousand Reddit comments. To fine-tune our model, our team manually labeled the sentiment of 3600 Tweets and then augmented our dataset by the method of back-translation. Text sentiment for each social media platform was then classified with our fine-tuned model using Python and the Huggingface sentiment analysis pipeline. Our results determined that the average sentiment expressed on Twitter was more negative (52% positive) than positive and the sentiment expressed on Reddit was more positive than negative (53% positive). Though average sentiment was found to vary between these social media platforms, both displayed similar behavior related to sentiment shared at key vaccine-related developments during the pandemic. Considering this similar trend in shared sentiment demonstrated across social media platforms, Twitter and Reddit continue to be valuable data sources that public health officials can utilize to strengthen vaccine confidence and combat misinformation. As the spread of misinformation poses a range of psychological and psychosocial risks (anxiety, fear, etc.), there is an urgency in understanding the public perspective and attitude toward shared falsities. Comprehensive educational delivery systems tailored to the population's expressed sentiments that facilitate digital literacy, health information-seeking behavior, and precision health promotion could aid in clarifying such misinformation.
CLFeb 23, 2023
Exploring celebrity influence on public attitude towards the COVID-19 pandemic: social media shared sentiment analysisBrianna M White, Chad A Melton, Parya Zareie et al.
The COVID-19 pandemic has introduced new opportunities for health communication, including an increase in the public use of online outlets for health-related emotions. People have turned to social media networks to share sentiments related to the impacts of the COVID-19 pandemic. In this paper we examine the role of social messaging shared by Persons in the Public Eye (i.e. athletes, politicians, news personnel) in determining overall public discourse direction. We harvested approximately 13 million tweets ranging from 1 January 2020 to 1 March 2022. The sentiment was calculated for each tweet using a fine-tuned DistilRoBERTa model, which was used to compare COVID-19 vaccine-related Twitter posts (tweets) that co-occurred with mentions of People in the Public Eye. Our findings suggest the presence of consistent patterns of emotional content co-occurring with messaging shared by Persons in the Public Eye for the first two years of the COVID-19 pandemic influenced public opinion and largely stimulated online public discourse. We demonstrate that as the pandemic progressed, public sentiment shared on social networks was shaped by risk perceptions, political ideologies and health-protective behaviours shared by Persons in the Public Eye, often in a negative light.
LGAug 23, 2024
Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative ReviewShaina Raza, Arash Shaban-Nejad, Elham Dolatabadi et al.
Background: The rapid advancement of Machine Learning (ML) represents novel opportunities to enhance public health research, surveillance, and decision-making. However, there is a lack of comprehensive understanding of algorithmic bias, systematic errors in predicted population health outcomes, resulting from the public health application of ML. The objective of this narrative review is to explore the types of bias generated by ML and quantitative metrics to assess these biases. Methods : We performed search on PubMed, MEDLINE, IEEE (Institute of Electrical and Electronics Engineers), ACM (Association for Computing Machinery) Digital Library, Science Direct, and Springer Nature. We used keywords to identify studies describing types of bias and metrics to measure these in the domain of ML and public and population health published in English between 2008 and 2023, inclusive. Results: A total of 72 articles met the inclusion criteria. Our review identified the commonly described types of bias and quantitative metrics to assess these biases from an equity perspective. Conclusion : The review will help formalize the evaluation framework for ML on public health from an equity perspective.
HCApr 11
Empathic and agentic artificial intelligence in nursing: perspectives on a human-centered framework for cancer care navigation in the United StatesTyra Girdwood, Saba Kheirinejad, Parnian Kheirkhah Rahimabad et al.
For patients experiencing cancer, nurse navigation can ease the burden of complex care by enhancing coordination of health services and patient outcomes. However, in under-resourced areas, trained nurse navigators may be limited or non-existent. In the United States, artificial intelligence (AI)-enabled digital health tools are increasingly available and may help address gaps in care coordination; however, most are not designed to specifically support nursing. This perspective piece discusses a human-centered AI framework that integrates empathic and agentic approaches grounded in the American Nurses Association's code of ethics to support nurses in the United States in cancer care navigation. The framework could augment, not replace, human empathy and agency while improving nurse workflow, patient-clinician relationships, and care coordination services in under-resourced areas.
LGAug 9, 2022
Association Between Neighborhood Factors and Adult Obesity in Shelby County, Tennessee: Geospatial Machine Learning ApproachWhitney S Brakefield, Olufunto A Olusanya, Arash Shaban-Nejad
Obesity is a global epidemic causing at least 2.8 million deaths per year. This complex disease is associated with significant socioeconomic burden, reduced work productivity, unemployment, and other social determinants of Health (SDoH) disparities. Objective: The objective of this study was to investigate the effects of SDoH on obesity prevalence among adults in Shelby County, Tennessee, USA using a geospatial machine-learning approach. Obesity prevalence was obtained from publicly available CDC 500 cities database while SDoH indicators were extracted from the U.S. Census and USDA. We examined the geographic distributions of obesity prevalence patterns using Getis-Ord Gi* statistics and calibrated multiple models to study the association between SDoH and adult obesity. Also, unsupervised machine learning was used to conduct grouping analysis to investigate the distribution of obesity prevalence and associated SDoH indicators. Results depicted a high percentage of neighborhoods experiencing high adult obesity prevalence within Shelby County. In the census tract, median household income, as well as the percentage of individuals who were black, home renters, living below the poverty level, fifty-five years or older, unmarried, and uninsured, had a significant association with adult obesity prevalence. The grouping analysis revealed disparities in obesity prevalence amongst disadvantaged neighborhoods. More research is needed that examines linkages between geographical location, SDoH, and chronic diseases. These findings, which depict a significantly higher prevalence of obesity within disadvantaged neighborhoods, and other geospatial information can be leveraged to offer valuable insights informing health decision-making and interventions that mitigate risk factors for increasing obesity prevalence.
CYJul 18, 2023
Medication abortion via digital health in the United States: a systematic scoping reviewFekede Asefa Kumsa, Rameshwari Prasad, Arash Shaban-Nejad
Digital health, including telemedicine, has increased access to abortion care. The convenience, flexibility of appointment times, and ensured privacy to abortion users may make abortion services via telemedicine preferable. This scoping review systematically mapped studies conducted on abortion services via telemedicine, including their effectiveness and acceptability for abortion users and providers. All published papers included abortion services via telemedicine in the United States were considered. Articles were searched in PubMed, CINAHL, and Google Scholar databases in September 2022. The findings were synthesized narratively, and the PRISMA-ScR guidelines were used to report this study. Out of 757 retrieved articles, 33 articles were selected based on the inclusion criteria. These studies were published between 2011 and 2022, with 24 published in the last 3 years. The study found that telemedicine increased access to abortion care in the United States, especially for people in remote areas or those worried about stigma from in-person visits. The effectiveness of abortion services via telemedicine was comparable to in-clinic visits, with 6% or fewer abortions requiring surgical intervention. Both care providers and abortion seekers expressed positive perceptions of telemedicine-based abortion services. However, abortion users reported mixed emotions, with some preferring in-person visits. The most common reasons for choosing telemedicine included the distance to the abortion clinic, convenience, privacy, cost, flexibility of appointment times, and state laws imposing waiting periods or restrictive policies. Telemedicine offered a preferable option for abortion seekers and providers. The feasibility of accessing abortion services via telemedicine in low-resource settings needs further investigation.
AIJul 26, 2022
An Urban Population Health Observatory for Disease Causal Pathway Analysis and Decision Support: Underlying Explainable Artificial Intelligence ModelWhitney S Brakefield, Nariman Ammar, Arash Shaban-Nejad
This study sought to (1) expand our existing Urban Population Health Observatory (UPHO) system by incorporating a semantics layer; (2) cohesively employ machine learning and semantic/logical inference to provide measurable evidence and detect pathways leading to undesirable health outcomes; (3) provide clinical use case scenarios and design case studies to identify socioenvironmental determinants of health associated with the prevalence of obesity, and (4) design a dashboard that demonstrates the use of UPHO in the context of obesity surveillance using the provided scenarios. The system design includes a knowledge graph generation component that provides contextual knowledge from relevant domains of interest. This system leverages semantics using concepts, properties, and axioms from existing ontologies. In addition, we used the publicly available US Centers for Disease Control and Prevention 500 Cities data set to perform multivariate analysis. A cohesive approach that employs machine learning and semantic/logical inference reveals pathways leading to diseases. In this study, we present 2 clinical case scenarios and a proof-of-concept prototype design of a dashboard that provides warnings, recommendations, and explanations and demonstrates the use of UPHO in the context of obesity surveillance, treatment, and prevention. While exploring the case scenarios using a support vector regression machine learning model, we found that poverty, lack of physical activity, education, and unemployment were the most important predictive variables that contribute to obesity in Memphis, TN. The application of UPHO could help reduce health disparities and improve urban population health. The expanded UPHO feature incorporates an additional level of interpretable knowledge to enhance physicians, researchers, and health officials' informed decision-making at both patient and community levels.
LGJan 30, 2025
Analyzing Geospatial and Socioeconomic Disparities in Breast Cancer Screening Among Populations in the United States: Machine Learning ApproachSoheil Hashtarkhani, Yiwang Zhou, Fekede Asefa Kumsa et al.
Breast cancer screening plays a pivotal role in early detection and subsequent effective management of the disease, impacting patient outcomes and survival rates. This study aims to assess breast cancer screening rates nationwide in the United States and investigate the impact of social determinants of health on these screening rates. Data on mammography screening at the census tract level for 2018 and 2020 were collected from the Behavioral Risk Factor Surveillance System. We developed a large dataset of social determinants of health, comprising 13 variables for 72337 census tracts. Spatial analysis employing Getis-Ord Gi statistics was used to identify clusters of high and low breast cancer screening rates. To evaluate the influence of these social determinants, we implemented a random forest model, with the aim of comparing its performance to linear regression and support vector machine models. The models were evaluated using R2 and root mean squared error metrics. Shapley Additive Explanations values were subsequently used to assess the significance of variables and direction of their influence. Geospatial analysis revealed elevated screening rates in the eastern and northern United States, while central and midwestern regions exhibited lower rates. The random forest model demonstrated superior performance, with an R2=64.53 and root mean squared error of 2.06 compared to linear regression and support vector machine models. Shapley Additive Explanations values indicated that the percentage of the Black population, the number of mammography facilities within a 10-mile radius, and the percentage of the population with at least a bachelor's degree were the most influential variables, all positively associated with mammography screening rates.
CLOct 8, 2025
Cancer Diagnosis Categorization in Electronic Health Records Using Large Language Models and BioBERT: Model Performance Evaluation StudySoheil Hashtarkhani, Rezaur Rashid, Christopher L Brett et al.
Electronic health records contain inconsistently structured or free-text data, requiring efficient preprocessing to enable predictive health care models. Although artificial intelligence-driven natural language processing tools show promise for automating diagnosis classification, their comparative performance and clinical reliability require systematic evaluation. The aim of this study is to evaluate the performance of 4 large language models (GPT-3.5, GPT-4o, Llama 3.2, and Gemini 1.5) and BioBERT in classifying cancer diagnoses from structured and unstructured electronic health records data. We analyzed 762 unique diagnoses (326 International Classification of Diseases (ICD) code descriptions, 436free-text entries) from 3456 records of patients with cancer. Models were tested on their ability to categorize diagnoses into 14predefined categories. Two oncology experts validated classifications. BioBERT achieved the highest weighted macro F1-score for ICD codes (84.2) and matched GPT-4o in ICD code accuracy (90.8). For free-text diagnoses, GPT-4o outperformed BioBERT in weighted macro F1-score (71.8 vs 61.5) and achieved slightly higher accuracy (81.9 vs 81.6). GPT-3.5, Gemini, and Llama showed lower overall performance on both formats. Common misclassification patterns included confusion between metastasis and central nervous system tumors, as well as errors involving ambiguous or overlapping clinical terminology. Although current performance levels appear sufficient for administrative and research use, reliable clinical applications will require standardized documentation practices alongside robust human oversight for high-stakes decision-making.
IRAug 22, 2021
Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: A call to action for strengthening vaccine confidenceChad A Melton, Olufunto A Olusanya, Nariman Ammar et al.
The COVID-19 pandemic fueled one of the most rapid vaccine developments in history. However, misinformation spread through online social media often leads to negative vaccine sentiment and hesitancy. To investigate COVID-19 vaccine-related discussion in social media, we conducted a sentiment analysis and Latent Dirichlet Allocation topic modeling on textual data collected from 13 Reddit communities focusing on the COVID-19 vaccine from Dec 1, 2020, to May 15, 2021. Data were aggregated and analyzed by month to detect changes in any sentiment and latent topics. ty analysis suggested these communities expressed more positive sentiment than negative regarding the vaccine-related discussions and has remained static over time. Topic modeling revealed community members mainly focused on side effects rather than outlandish conspiracy theories. Covid-19 vaccine-related content from 13 subreddits show that the sentiments expressed in these communities are overall more positive than negative and have not meaningfully changed since December 2020. Keywords indicating vaccine hesitancy were detected throughout the LDA topic modeling. Public sentiment and topic modeling analysis regarding vaccines could facilitate the implementation of appropriate messaging, digital interventions, and new policies to promote vaccine confidence.
LGMay 5, 2021
Predicting Intensive Care Unit Length of Stay and Mortality Using Patient Vital Signs: Machine Learning Model Development and ValidationKhalid Alghatani, Nariman Ammar, Abdelmounaam Rezgui et al.
Patient monitoring is vital in all stages of care. We here report the development and validation of ICU length of stay and mortality prediction models. The models will be used in an intelligent ICU patient monitoring module of an Intelligent Remote Patient Monitoring (IRPM) framework that monitors the health status of patients, and generates timely alerts, maneuver guidance, or reports when adverse medical conditions are predicted. We utilized the publicly available Medical Information Mart for Intensive Care (MIMIC) database to extract ICU stay data for adult patients to build two prediction models: one for mortality prediction and another for ICU length of stay. For the mortality model, we applied six commonly used machine learning (ML) binary classification algorithms for predicting the discharge status (survived or not). For the length of stay model, we applied the same six ML algorithms for binary classification using the median patient population ICU stay of 2.64 days. For the regression-based classification, we used two ML algorithms for predicting the number of days. We built two variations of each prediction model: one using 12 baseline demographic and vital sign features, and the other based on our proposed quantiles approach, in which we use 21 extra features engineered from the baseline vital sign features, including their modified means, standard deviations, and quantile percentages. We could perform predictive modeling with minimal features while maintaining reasonable performance using the quantiles approach. The best accuracy achieved in the mortality model was approximately 89% using the random forest algorithm. The highest accuracy achieved in the length of stay model, based on the population median ICU stay (2.64 days), was approximately 65% using the random forest algorithm.
AIMar 16, 2021
Using a Personal Health Library-Enabled mHealth Recommender System for Self-Management of Diabetes Among Underserved Populations: Use Case for Knowledge Graphs and Linked DataNariman Ammar, James E Bailey, Robert L Davis et al.
Personal health libraries (PHLs) provide a single point of secure access to patients digital health data and enable the integration of knowledge stored in their digital health profiles with other sources of global knowledge. PHLs can help empower caregivers and health care providers to make informed decisions about patients health by understanding medical events in the context of their lives. This paper reports the implementation of a mobile health digital intervention that incorporates both digital health data stored in patients PHLs and other sources of contextual knowledge to deliver tailored recommendations for improving self-care behaviors in diabetic adults. We conducted a thematic assessment of patient functional and nonfunctional requirements that are missing from current EHRs based on evidence from the literature. We used the results to identify the technologies needed to address those requirements. We describe the technological infrastructures used to construct, manage, and integrate the types of knowledge stored in the PHL. We leverage the Social Linked Data (Solid) platform to design a fully decentralized and privacy-aware platform that supports interoperability and care integration. We provided an initial prototype design of a PHL and drafted a use case scenario that involves four actors to demonstrate how the proposed prototype can be used to address user requirements, including the construction and management of the PHL and its utilization for developing a mobile app that queries the knowledge stored and integrated into the PHL in a private and fully decentralized manner to provide better recommendations. The proposed PHL helps patients and their caregivers take a central role in making decisions regarding their health and equips their health care providers with informatics tools that support the collection and interpretation of the collected knowledge.
AINov 6, 2020
Explainable Artificial Intelligence Recommendation System by Leveraging the Semantics of Adverse Childhood Experiences: Proof-of-Concept Prototype DevelopmentNariman Ammar, Arash Shaban-Nejad
The study of adverse childhood experiences and their consequences has emerged over the past 20 years. In this study, we aimed to leverage explainable artificial intelligence, and propose a proof-of-concept prototype for a knowledge-driven evidence-based recommendation system to improve surveillance of adverse childhood experiences. We used concepts from an ontology that we have developed to build and train a question-answering agent using the Google DialogFlow engine. In addition to the question-answering agent, the initial prototype includes knowledge graph generation and recommendation components that leverage third-party graph technology. To showcase the framework functionalities, we here present a prototype design and demonstrate the main features through four use case scenarios motivated by an initiative currently implemented at a children hospital in Memphis, Tennessee. Ongoing development of the prototype requires implementing an optimization algorithm of the recommendations, incorporating a privacy layer through a personal health library, and conducting a clinical trial to assess both usability and usefulness of the implementation. This semantic-driven explainable artificial intelligence prototype can enhance health care practitioners ability to provide explanations for the decisions they make.
CYNov 21, 2019
An Innovative Approach to Addressing Childhood Obesity: A Knowledge-Based Infrastructure for Supporting Multi-Stakeholder Partnership Decision-Making in Quebec, CanadaNii Antiaye Addy, Arash Shaban-Nejad, David L. Buckeridge et al.
The purpose of this paper is to describe and analyze the development of a knowledge-based infrastructure to support MSP decision-making processes. The paper emerged from a study to define specifications for a knowledge-based infrastructure to provide decision support for community-level MSPs in the Canadian province of Quebec. As part of the study, a process assessment was conducted to understand the needs of communities as they collect, organize, and analyze data to make decisions about their priorities. The result of this process is a portrait, which is an epidemiological profile of health and nutrition in their community. Portraits inform strategic planning and development of interventions and are used to assess the impact of interventions. Our key findings indicate ambiguities and disagreement among MSP decision-makers regarding causal relationships between actions and outcomes, and the relevant data needed for making decisions. MSP decision-makers expressed a desire for easy-to-use tools that facilitate the collection, organization, synthesis, and analysis of data, to enable decision-making in a timely manner. Findings inform conceptual modeling and ontological analysis to capture the domain knowledge and specify relationships between actions and outcomes. This modeling and analysis provide the foundation for an ontology, encoded using OWL 2 Web Ontology Language. The ontology is developed to provide semantic support for the MSP process, defining objectives, strategies, actions, indicators, and data sources. In the future, software interacting with the ontology can facilitate interactive browsing by decision-makers in the MSP in the form of concepts, instances, relationships, and axioms. Our ontology also facilitates the integration and interpretation of community data and can help in managing semantic interoperability between different knowledge sources.
CYNov 19, 2019
Adverse Childhood Experiences Ontology for Mental Health Surveillance, Research, and Evaluation: Advanced Knowledge Representation and Semantic Web TechniquesJon Hael Brenas, Eun Kyong Shin, Arash Shaban-Nejad
Background: Adverse Childhood Experiences (ACEs), a set of negative events and processes that a person might encounter during childhood and adolescence, have been proven to be linked to increased risks of a multitude of negative health outcomes and conditions when children reach adulthood and beyond. Objective: To better understand the relationship between ACEs and their relevant risk factors with associated health outcomes and to eventually design and implement preventive interventions, access to an integrated coherent dataset is needed. Therefore, we implemented a formal ontology as a resource to allow the mental health community to facilitate data integration and knowledge modeling and to improve ACEs surveillance and research. Methods: We use advanced knowledge representation and Semantic Web tools and techniques to implement the ontology. The current implementation of the ontology is expressed in the description logic ALCRIQ(D), a sublogic of Web Ontology Language (OWL 2). Results: The ACEs Ontology has been implemented and made available to the mental health community and the public via the BioPortal repository. Moreover, multiple use-case scenarios have been introduced to showcase and evaluate the usability of the ontology in action. The ontology was created to be used by major actors in the ACEs community with different applications, from the diagnosis of individuals and predicting potential negative outcomes that they might encounter to the prevention of ACEs in a population and designing interventions and policies. Conclusions: The ACEs Ontology provides a uniform and reusable semantic network and an integrated knowledge structure for mental health practitioners and researchers to improve ACEs surveillance and evaluation.