AIAug 21, 2023Code
CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy ProtectionSonghui Yue, Xiaoyan Hong, Randy K. Smith
The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose a novel Hierarchical Ontology-State Modeling (HOSM) framework CSM-H-R, which programmatically combines ontologies and states at the modeling phase and runtime phase for attaining the ability to recognize meaningful HLC. It builds on the model of our prior work on the Context State Machine (CSM) engine by incorporating the H (Hierarchy) and R (Relationship and tRansition) dimensions to take care of the dynamic aspects of context. The design of the framework supports the sharing and interoperation of context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition. Case studies are developed for IntellElevator and IntellRestaurant, two intelligent applications in a smart campus setting. The prototype implementation of the framework experiments on translating the HLC reasoning into vector and matrix computing and presents the potential of using advanced probabilistic models to reach the next level of automation in integrating intelligent systems; meanwhile, privacy protection support is achieved in the application domain by anonymization through indexing and reducing information correlation. An implementation of the framework is available at https://github.com/songhui01/CSM-H-R.
CLSep 11, 2023
Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical LiteratureArmando D. Diaz Gonzalez, Kevin S. Hughes, Songhui Yue et al.
Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach to connect each entity with its data source and visualize them in a graph-based knowledge representation. Lastly, we discuss the knowledge graph applications, limitations, and challenges to inspire the future research of germline corpora. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples. Graph-based visualizations are used to show the results.
SIAug 10, 2023
Using Twitter Data to Determine Hurricane Category: An ExperimentSonghui Yue, Jyothsna Kondari, Aibek Musaev et al.
Social media posts contain an abundant amount of information about public opinion on major events, especially natural disasters such as hurricanes. Posts related to an event, are usually published by the users who live near the place of the event at the time of the event. Special correlation between the social media data and the events can be obtained using data mining approaches. This paper presents research work to find the mappings between social media data and the severity level of a disaster. Specifically, we have investigated the Twitter data posted during hurricanes Harvey and Irma, and attempted to find the correlation between the Twitter data of a specific area and the hurricane level in that area. Our experimental results indicate a positive correlation between them. We also present a method to predict the hurricane category for a specific area using relevant Twitter data.
SEAug 29, 2025Code
LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering StandardsSonghui Yue
Ontologies have supported knowledge representation and whitebox reasoning for decades; thus, the automated ontology generation (AOG) plays a crucial role in scaling their use. Software engineering standards (SES) consist of long, unstructured text (with high noise) and paragraphs with domain-specific terms. In this setting, relation triple extraction (RTE), together with term extraction, constitutes the first stage toward AOG. This work proposes an open-source large language model (LLM)-assisted approach to RTE for SES. Instead of solely relying on prompt-engineering-based methods, this study promotes the use of LLMs as an aid in constructing ontologies and explores an effective AOG workflow that includes document segmentation, candidate term mining, LLM-based relation inference, term normalization, and cross-section alignment. Golden-standard benchmarks at three granularities are constructed and used to evaluate the ontology generated from the study. The results show that it is comparable and potentially superior to the OpenIE method of triple extraction.
SEApr 7, 2024
A Data-to-Product Multimodal Conceptual Framework to Achieve Automated Software Evolution for Context-rich Intelligent ApplicationsSonghui Yue
While AI is extensively transforming Software Engineering (SE) fields, SE is still in need of a framework to overall consider all phases to facilitate Automated Software Evolution (ASEv), particularly for intelligent applications that are context-rich, instead of conquering each division independently. Its complexity comes from the intricacy of the intelligent applications, the heterogeneity of the data sources, and the constant changes in the context. This study proposes a conceptual framework for achieving automated software evolution, emphasizing the importance of multimodality learning. A Selective Sequential Scope Model (3S) model is developed based on the conceptual framework, and it can be used to categorize existing and future research when it covers different SE phases and multimodal learning tasks. This research is a preliminary step toward the blueprint of a higher-level ASEv. The proposed conceptual framework can act as a practical guideline for practitioners to prepare themselves for diving into this area. Although the study is about intelligent applications, the framework and analysis methods may be adapted for other types of software as AI brings more intelligence into their life cycles.
DBNov 10, 2025
OntoTune: Ontology-Driven Learning for Query Optimization with Convolutional ModelsSonghui Yue, Yang Shao, Sean Hayes
Query optimization has been studied using machine learning, reinforcement learning, and, more recently, graph-based convolutional networks. Ontology, as a structured, information-rich knowledge representation, can provide context, particularly in learning problems. This paper presents OntoTune, an ontology-based platform for enhancing learning for query optimization. By connecting SQL queries, database metadata, and statistics, the ontology developed in this research is promising in capturing relationships and important determinants of query performance. This research also develops a method to embed ontologies while preserving as much of the relationships and key information as possible, before feeding it into learning algorithms such as tree-based and graph-based convolutional networks. A case study shows how OntoTune's ontology-driven learning delivers performance gains compared with database system default query execution.