CLJun 22, 2023
Natural Language Processing in Electronic Health Records in Relation to Healthcare Decision-making: A Systematic ReviewElias Hossain, Rajib Rana, Niall Higgins et al.
Background: Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. Methodology: After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: 1) medical note classification, 2) clinical entity recognition, 3) text summarisation, 4) deep learning (DL) and transfer learning architecture, 5) information extraction, 6) Medical language translation and 7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Result and Discussion: EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. Conclusion: We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
CVSep 17, 2022
Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle VideoTianya T. Zhang Ph. D., Peter J. Jin Ph. D., Han Zhou et al.
Spatial-temporal Map (STMap)-based methods have shown great potential to process high-angle videos for vehicle trajectory reconstruction, which can meet the needs of various data-driven modeling and imitation learning applications. In this paper, we developed Spatial-Temporal Deep Embedding (STDE) model that imposes parity constraints at both pixel and instance levels to generate instance-aware embeddings for vehicle stripe segmentation on STMap. At pixel level, each pixel was encoded with its 8-neighbor pixels at different ranges, and this encoding is subsequently used to guide a neural network to learn the embedding mechanism. At the instance level, a discriminative loss function is designed to pull pixels belonging to the same instance closer and separate the mean value of different instances far apart in the embedding space. The output of the spatial-temporal affinity is then optimized by the mutex-watershed algorithm to obtain final clustering results. Based on segmentation metrics, our model outperformed five other baselines that have been used for STMap processing and shows robustness under the influence of shadows, static noises, and overlapping. The designed model is applied to process all public NGSIM US-101 videos to generate complete vehicle trajectories, indicating a good scalability and adaptability. Last but not least, the strengths of the scanline method with STDE and future directions were discussed. Code, STMap dataset and video trajectory are made publicly available in the online repository. GitHub Link: shorturl.at/jklT0.
HCAug 28, 2023
Trust in Construction AI-Powered Collaborative Robots: A Qualitative Empirical AnalysisNewsha Emaminejad, Reza Akhavian, Ph. D
Construction technology researchers and forward-thinking companies are experimenting with collaborative robots (aka cobots), powered by artificial intelligence (AI), to explore various automation scenarios as part of the digital transformation of the industry. Intelligent cobots are expected to be the dominant type of robots in the future of work in construction. However, the black-box nature of AI-powered cobots and unknown technical and psychological aspects of introducing them to job sites are precursors to trust challenges. By analyzing the results of semi-structured interviews with construction practitioners using grounded theory, this paper investigates the characteristics of trustworthy AI-powered cobots in construction. The study found that while the key trust factors identified in a systematic literature review -- conducted previously by the authors -- resonated with the field experts and end users, other factors such as financial considerations and the uncertainty associated with change were also significant barriers against trusting AI-powered cobots in construction.
CRApr 11, 2025Code
X-Guard: Multilingual Guard Agent for Content ModerationBibek Upadhayay, Vahid Behzadan, Ph. D
Large Language Models (LLMs) have rapidly become integral to numerous applications in critical domains where reliability is paramount. Despite significant advances in safety frameworks and guardrails, current protective measures exhibit crucial vulnerabilities, particularly in multilingual contexts. Existing safety systems remain susceptible to adversarial attacks in low-resource languages and through code-switching techniques, primarily due to their English-centric design. Furthermore, the development of effective multilingual guardrails is constrained by the scarcity of diverse cross-lingual training data. Even recent solutions like Llama Guard-3, while offering multilingual support, lack transparency in their decision-making processes. We address these challenges by introducing X-Guard agent, a transparent multilingual safety agent designed to provide content moderation across diverse linguistic contexts. X-Guard effectively defends against both conventional low-resource language attacks and sophisticated code-switching attacks. Our approach includes: curating and enhancing multiple open-source safety datasets with explicit evaluation rationales; employing a jury of judges methodology to mitigate individual judge LLM provider biases; creating a comprehensive multilingual safety dataset spanning 132 languages with 5 million data points; and developing a two-stage architecture combining a custom-finetuned mBART-50 translation module with an evaluation X-Guard 3B model trained through supervised finetuning and GRPO training. Our empirical evaluations demonstrate X-Guard's effectiveness in detecting unsafe content across multiple languages while maintaining transparency throughout the safety evaluation process. Our work represents a significant advancement in creating robust, transparent, and linguistically inclusive safety systems for LLMs and its integrated systems.
CLMar 24
PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and DisambiguationManjushree B. Aithal, Ph. D., Alexander Kotz et al.
Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like life-threatening medication errors. While cloud-dependent LLMs excel at Acronym Disambiguation, transmitting Protected Health Information to external servers violates privacy frameworks. To bridge this gap, this study pioneers the evaluation of small-parameter models deployed entirely on-device to ensure privacy preservation. We introduce a privacy-preserving cascaded pipeline leveraging general-purpose local models to detect clinical acronyms, routing them to domain-specific biomedical models for context-relevant expansions. Results reveal that while general instruction-following models achieve high detection accuracy (~0.988), their expansion capabilities plummet (~0.655). Our cascaded approach utilizes domain-specific medical models to increase expansion accuracy to (~0.81). This novel work demonstrates that privacy-preserving, on-device (2B-10B) models deliver high-fidelity clinical acronym disambiguation support.
CVJun 20, 2025
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction TechniquesMichael Gyimadu, Gregory Bell, Ph. D
High-dimensional image data often require dimensionality reduction before further analysis. This paper provides a purely analytical comparison of two linear techniques-Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). After the derivation of each algorithm from first principles, we assess their interpretability, numerical stability, and suitability for differing matrix shapes. We synthesize rule-of-thumb guidelines for choosing one out of the two algorithms without empirical benchmarking, building on classical and recent numerical literature. Limitations and directions for future experimental work are outlined at the end.
CVOct 6, 2025
Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure SystemsCheyu Lin, John Martins, Katherine A. Flanigan et al.
Cyber-physical systems (CPS) integrate sensing, computing, and control to improve infrastructure performance, focusing on economic goals like performance and safety. However, they often neglect potential human-centered (or ''social'') benefits. Cyber-physical-social infrastructure systems (CPSIS) aim to address this by aligning CPS with social objectives. This involves defining social benefits, understanding human interactions with each other and infrastructure, developing privacy-preserving measurement methods, modeling these interactions for prediction, linking them to social benefits, and actuating the physical environment to foster positive social outcomes. This paper delves into recognizing dyadic human interactions using real-world data, which is the backbone to measuring social behavior. This lays a foundation to address the need to enhance understanding of the deeper meanings and mutual responses inherent in human interactions. While RGB cameras are informative for interaction recognition, privacy concerns arise. Depth sensors offer a privacy-conscious alternative by analyzing skeletal movements. This study compares five skeleton-based interaction recognition algorithms on a dataset of 12 dyadic interactions. Unlike single-person datasets, these interactions, categorized into communication types like emblems and affect displays, offer insights into the cultural and emotional aspects of human interactions.
AIJun 13, 2025
Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context LearningLiying Wang, Ph. D., Daffodil Carrington et al. · uw
Family caregivers often face substantial mental health challenges due to their multifaceted roles and limited resources. This study explored the potential of a large language model (LLM)-powered conversational agent to deliver evidence-based mental health support for caregivers, specifically Problem-Solving Therapy (PST) integrated with Motivational Interviewing (MI) and Behavioral Chain Analysis (BCA). A within-subject experiment was conducted with 28 caregivers interacting with four LLM configurations to evaluate empathy and therapeutic alliance. The best-performing models incorporated Few-Shot and Retrieval-Augmented Generation (RAG) prompting techniques, alongside clinician-curated examples. The models showed improved contextual understanding and personalized support, as reflected by qualitative responses and quantitative ratings on perceived empathy and therapeutic alliances. Participants valued the model's ability to validate emotions, explore unexpressed feelings, and provide actionable strategies. However, balancing thorough assessment with efficient advice delivery remains a challenge. This work highlights the potential of LLMs in delivering empathetic and tailored support for family caregivers.
ROFeb 27, 2022
Feasibility and Acceptability of Remote Neuromotor Rehabilitation Interactions Using Social Robot Augmented Telepresence: A Case StudyMichael J. Sobrepera, Vera G. Lee, Suveer Garg et al.
There is a growing need to deliver rehabilitation care to patients remotely. Long term demographic changes, geographic shortages of care providers, and now a global pandemic contribute to this need. Telepresence provides an option for delivering this care. However, telepresence using video and audio alone does not provide an interaction of the same quality as in-person. To bridge this gap, we propose the use of social robot augmented telepresence (SRAT). We have constructed a demonstration SRAT system for upper extremity rehab, in which a humanoid, with a head, body, face, and arms, is attached to a mobile telepresence system, to collaborate with the patient and clinicians as an independent social entity. The humanoid can play games with the patient and demonstrate activities.These activities could be used both to perform assessments in support of self-directed rehab and to perform exercises. In this paper, we present a case series with six subjects who completed interactions with the robot, three subjects who have previously suffered a stroke and three pediatric subjects who are typically developing. Subjects performed a Simon Says activity and a target touch activity in person, using classical telepresence (CT), and using SRAT. Subjects were able to effectively work with the social robot guiding interactions and 5 of 6 rated SRAT better than CT. This study demonstrates the feasibility of SRAT and some of its benefits.
SDDec 10, 2021
An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion RecognitionMd. Rayhan Ahmed, Salekul Islam, Ph. D et al.
In this paper, we propose an ensemble of deep neural networks along with data augmentation (DA) learned using effective speech-based features to recognize emotions from speech. Our ensemble model is built on three deep neural network-based models. These neural networks are built using the basic local feature acquiring blocks (LFAB) which are consecutive layers of dilated 1D Convolutional Neural networks followed by the max pooling and batch normalization layers. To acquire the long-term dependencies in speech signals further two variants are proposed by adding Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) layers respectively. All three network models have consecutive fully connected layers before the final softmax layer for classification. The ensemble model uses a weighted average to provide the final classification. We have utilized five standard benchmark datasets: TESS, EMO-DB, RAVDESS, SAVEE, and CREMA-D for evaluation. We have performed DA by injecting Additive White Gaussian Noise, pitch shifting, and stretching the signal level to generalize the models, and thus increasing the accuracy of the models and reducing the overfitting as well. We handcrafted five categories of features: Mel-frequency cepstral coefficients, Log Mel-Scaled Spectrogram, Zero-Crossing Rate, Chromagram, and statistical Root Mean Square Energy value from each audio sample. These features are used as the input to the LFAB blocks that further extract the hidden local features which are then fed to either fully connected layers or to LSTM or GRU based on the model type to acquire the additional long-term contextual representations. LFAB followed by GRU or LSTM results in better performance compared to the baseline model. The ensemble model achieves the state-of-the-art weighted average accuracy in all the datasets.
CVNov 18, 2020
Extracting and Learning Fine-Grained Labels from Chest RadiographsTanveer Syeda-Mahmood, Ph. D, K. C. L Wong et al.
Chest radiographs are the most common diagnostic exam in emergency rooms and intensive care units today. Recently, a number of researchers have begun working on large chest X-ray datasets to develop deep learning models for recognition of a handful of coarse finding classes such as opacities, masses and nodules. In this paper, we focus on extracting and learning fine-grained labels for chest X-ray images. Specifically we develop a new method of extracting fine-grained labels from radiology reports by combining vocabulary-driven concept extraction with phrasal grouping in dependency parse trees for association of modifiers with findings. A total of 457 fine-grained labels depicting the largest spectrum of findings to date were selected and sufficiently large datasets acquired to train a new deep learning model designed for fine-grained classification. We show results that indicate a highly accurate label extraction process and a reliable learning of fine-grained labels. The resulting network, to our knowledge, is the first to recognize fine-grained descriptions of findings in images covering over nine modifiers including laterality, location, severity, size and appearance.
CLFeb 2, 2020
Assessment of Amazon Comprehend Medical: Medication Information ExtractionBenedict Guzman, MS, Isabel Metzger et al.
In November 27, 2018, Amazon Web Services (AWS) released Amazon Comprehend Medical (ACM), a deep learning based system that automatically extracts clinical concepts (which include anatomy, medical conditions, protected health information (PH)I, test names, treatment names, and medical procedures, and medications) from clinical text notes. Uptake and trust in any new data product relies on independent validation across benchmark datasets and tools to establish and confirm expected quality of results. This work focuses on the medication extraction task, and particularly, ACM was evaluated using the official test sets from the 2009 i2b2 Medication Extraction Challenge and 2018 n2c2 Track 2: Adverse Drug Events and Medication Extraction in EHRs. Overall, ACM achieved F-scores of 0.768 and 0.828. These scores ranked the lowest when compared to the three best systems in the respective challenges. To further establish the generalizability of its medication extraction performance, a set of random internal clinical text notes from NYU Langone Medical Center were also included in this work. And in this corpus, ACM garnered an F-score of 0.753.
IVOct 18, 2019
Intracranial Hemorrhage Segmentation Using Deep Convolutional ModelMurtadha D. Hssayeni, M. S., Muayad S. Croock et al.
Traumatic brain injuries could cause intracranial hemorrhage (ICH). ICH could lead to disability or death if it is not accurately diagnosed and treated in a time-sensitive procedure. The current clinical protocol to diagnose ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist. In this paper, we designed a study protocol to collect a dataset of 82 CT scans of subjects with traumatic brain injury. Later, the ICH regions were manually delineated in each slice by a consensus decision of two radiologists. Recently, fully convolutional networks (FCN) have shown to be successful in medical image segmentation. We developed a deep FCN, called U-Net, to segment the ICH regions from the CT scans in a fully automated manner. The method achieved a Dice coefficient of 0.31 for the ICH segmentation based on 5-fold cross-validation. The dataset is publicly available online at PhysioNet repository for future analysis and comparison.
CVSep 19, 2018
Wearable-based Mediation State Detection in Individuals with Parkinson's DiseaseMurtadha D. Hssayeni, Michelle A. Burack, M. D. et al.
One of the most prevalent complaints of individuals with mid-stage and advanced Parkinson's disease (PD) is the fluctuating response to their medication (i.e., ON state with maximum benefit from medication and OFF state with no benefit from medication). In order to address these motor fluctuations, the patients go through periodic clinical examination where the treating physician reviews the patients' self-report about duration in different medication states and optimize therapy accordingly. Unfortunately, the patients' self-report can be unreliable and suffer from recall bias. There is a need to a technology-based system that can provide objective measures about the duration in different medication states that can be used by the treating physician to successfully adjust the therapy. In this paper, we developed a medication state detection algorithm to detect medication states using two wearable motion sensors. A series of significant features are extracted from the motion data and used in a classifier that is based on a support vector machine with fuzzy labeling. The developed algorithm is evaluated using a dataset with 19 PD subjects and a total duration of 1,052.24 minutes (17.54 hours). The algorithm resulted in an average classification accuracy of 90.5%, sensitivity of 94.2%, and specificity of 85.4%.
NEAug 3, 2018
Geared Rotationally Identical and Invariant Convolutional Neural Network SystemsShihChung B. Lo, Ph. D., Matthew T. Freedman et al.
Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting networks of participated processes at the first flatten layer. Using an ordinary CNN structure as a base, requirements for constructing a GRI-CNN include the use of either symmetric input vector or kernels with an angle increment that can form a complete cycle as a "gearwheel". Four basic GRI-CNN structures were studied. Each of them can produce quantitatively identical output results when a rotation angle of the input vector is evenly divisible by the step angle of the gear. Our study showed when an input vector rotated with an angle does not match to a step angle, the GRI-CNN can also produce a highly consistent result. With a design of using an ultra-fine gear-tooth step angle (e.g., 1 degree or 0.1 degree), all four GRI-CNN systems can be constructed virtually isotropically.