Cogan Shimizu

AI
h-index32
16papers
170citations
Novelty32%
AI Score42

16 Papers

AIMay 27, 2022
Ontology Design Facilitating Wikibase Integration -- and a Worked Example for Historical Data

Cogan Shimizu, Andrew Eells, Seila Gonzalez et al.

Wikibase -- which is the software underlying Wikidata -- is a powerful platform for knowledge graph creation and management. However, it has been developed with a crowd-sourced knowledge graph creation scenario in mind, which in particular means that it has not been designed for use case scenarios in which a tightly controlled high-quality schema, in the form of an ontology, is to be imposed, and indeed, independently developed ontologies do not necessarily map seamlessly to the Wikibase approach. In this paper, we provide the key ingredients needed in order to combine traditional ontology modeling with use of the Wikibase platform, namely a set of \emph{axiom} patterns that bridge the paradigm gap, together with usage instructions and a worked example for historical data.

CLJul 31, 2023
A Modular Ontology for MODS -- Metadata Object Description Schema

Rushrukh Rayan, Cogan Shimizu, Heidi Sieverding et al.

The Metadata Object Description Schema (MODS) was developed to describe bibliographic concepts and metadata and is maintained by the Library of Congress. Its authoritative version is given as an XML schema based on an XML mindset which means that it has significant limitations for use in a knowledge graphs context. We have therefore developed the Modular MODS Ontology (MMODS-O) which incorporates all elements and attributes of the MODS XML schema. In designing the ontology, we adopt the recent Modular Ontology Design Methodology (MOMo) with the intention to strike a balance between modularity and quality ontology design on the one hand, and conservative backward compatibility with MODS on the other.

39.2AIMay 14
Small, Private Language Models as Teammates for Educational Assessment Design

Chris Davis Jaldi, Anmol Saini, Shan Zhang et al.

Generative AI increasingly supports educational design tasks, e.g., through Large Language Models (LLMs), demonstrating the capability to design assessment questions that are aligned with pedagogical frameworks (e.g., Bloom's taxonomy). However, they often rely on subjective or limited evaluation methods; focus primarily on proprietary models; or rarely systematically examine generation, evaluation, or deployment constraints in real educational settings. Meanwhile, Small Language Models (SLMs) have emerged as local alternatives that better address privacy and resource limitations; yet their effectiveness for assessment tasks remains underexplored. To address this gap, we systematically compare LLMs and SLMs for assessment question design; evaluate generation quality across Bloom's taxonomy levels using reproducible, pedagogically grounded metrics; and further assess model-based judging against expert-informed evaluation by analyzing reliability and agreement patterns. Results show that SLMs achieve competitive performance across key pedagogically motivated quality dimensions while enabling local, privacy-sensitive deployment. However, model-based evaluations also exhibit systematic inconsistencies and bias relative to expert ratings. These findings provide evidence to posit language models as bounded assistants in assessment workflows; underscore the necessity of Human-in-the-Loop; and advance the automated educational question generation field by examining quality, reliability, and deployment-aware trade-offs.

AINov 14, 2024
Accelerating Knowledge Graph and Ontology Engineering with Large Language Models

Cogan Shimizu, Pascal Hitzler

Large Language Models bear the promise of significant acceleration of key Knowledge Graph and Ontology Engineering tasks, including ontology modeling, extension, modification, population, alignment, as well as entity disambiguation. We lay out LLM-based Knowledge Graph and Ontology Engineering as a new and coming area of research, and argue that modular approaches to ontologies will be of central importance.

HCNov 16, 2024
Education in the Era of Neurosymbolic AI

Chris Davis Jaldi, Eleni Ilkou, Noah Schroeder et al.

Education is poised for a transformative shift with the advent of neurosymbolic artificial intelligence (NAI), which will redefine how we support deeply adaptive and personalized learning experiences. NAI-powered education systems will be capable of interpreting complex human concepts and contexts while employing advanced problem-solving strategies, all grounded in established pedagogical frameworks. This will enable a level of personalization in learning systems that to date has been largely unattainable at scale, providing finely tailored curricula that adapt to an individual's learning pace and accessibility needs, including the diagnosis of student understanding of subjects at a fine-grained level, identifying gaps in foundational knowledge, and adjusting instruction accordingly. In this paper, we propose a system that leverages the unique affordances of pedagogical agents -- embodied characters designed to enhance learning -- as critical components of a hybrid NAI architecture. To do so, these agents can thus simulate nuanced discussions, debates, and problem-solving exercises that push learners beyond rote memorization toward deep comprehension. We discuss the rationale for our system design and the preliminary findings of our work. We conclude that education in the era of NAI will make learning more accessible, equitable, and aligned with real-world skills. This is an era that will explore a new depth of understanding in educational tools.

AINov 3, 2024
Ontology Population using LLMs

Sanaz Saki Norouzi, Adrita Barua, Antrea Christou et al.

Knowledge graphs (KGs) are increasingly utilized for data integration, representation, and visualization. While KG population is critical, it is often costly, especially when data must be extracted from unstructured text in natural language, which presents challenges, such as ambiguity and complex interpretations. Large Language Models (LLMs) offer promising capabilities for such tasks, excelling in natural language understanding and content generation. However, their tendency to ``hallucinate'' can produce inaccurate outputs. Despite these limitations, LLMs offer rapid and scalable processing of natural language data, and with prompt engineering and fine-tuning, they can approximate human-level performance in extracting and structuring data for KGs. This study investigates LLM effectiveness for the KG population, focusing on the Enslaved.org Hub Ontology. In this paper, we report that compared to the ground truth, LLM's can extract ~90% of triples, when provided a modular ontology as guidance in the prompts.

AIOct 17, 2024
The KnowWhereGraph Ontology

Cogan Shimizu, Shirly Stephe, Adrita Barua et al.

KnowWhereGraph is one of the largest fully publicly available geospatial knowledge graphs. It includes data from 30 layers on natural hazards (e.g., hurricanes, wildfires), climate variables (e.g., air temperature, precipitation), soil properties, crop and land-cover types, demographics, and human health, various place and region identifiers, among other themes. These have been leveraged through the graph by a variety of applications to address challenges in food security and agricultural supply chains; sustainability related to soil conservation practices and farm labor; and delivery of emergency humanitarian aid following a disaster. In this paper, we introduce the ontology that acts as the schema for KnowWhereGraph. This broad overview provides insight into the requirements and design specifications for the graph and its schema, including the development methodology (modular ontology modeling) and the resources utilized to implement, materialize, and deploy KnowWhereGraph with its end-user interfaces and public query SPARQL endpoint.

AIOct 18, 2024
The S2 Hierarchical Discrete Global Grid as a Nexus for Data Representation, Integration, and Querying Across Geospatial Knowledge Graphs

Shirly Stephen, Mitchell Faulk, Krzysztof Janowicz et al.

Geospatial Knowledge Graphs (GeoKGs) have become integral to the growing field of Geospatial Artificial Intelligence. Initiatives like the U.S. National Science Foundation's Open Knowledge Network program aim to create an ecosystem of nation-scale, cross-disciplinary GeoKGs that provide AI-ready geospatial data aligned with FAIR principles. However, building this infrastructure presents key challenges, including 1) managing large volumes of data, 2) the computational complexity of discovering topological relations via SPARQL, and 3) conflating multi-scale raster and vector data. Discrete Global Grid Systems (DGGS) help tackle these issues by offering efficient data integration and representation strategies. The KnowWhereGraph utilizes Google's S2 Geometry -- a DGGS framework -- to enable efficient multi-source data processing, qualitative spatial querying, and cross-graph integration. This paper outlines the implementation of S2 within KnowWhereGraph, emphasizing its role in topologically enriching and semantically compressing data. Ultimately, this work demonstrates the potential of DGGS frameworks, particularly S2, for building scalable GeoKGs.

AIFeb 28, 2024
Commonsense Ontology Micropatterns

Andrew Eells, Brandon Dave, Pascal Hitzler et al.

The previously introduced Modular Ontology Modeling methodology (MOMo) attempts to mimic the human analogical process by using modular patterns to assemble more complex concepts. To support this, MOMo organizes organizes ontology design patterns into design libraries, which are programmatically queryable, to support accelerated ontology development, for both human and automated processes. However, a major bottleneck to large-scale deployment of MOMo is the (to-date) limited availability of ready-to-use ontology design patterns. At the same time, Large Language Models have quickly become a source of common knowledge and, in some cases, replacing search engines for questions. In this paper, we thus present a collection of 104 ontology design patterns representing often occurring nouns, curated from the common-sense knowledge available in LLMs, organized into a fully-annotated modular ontology design library ready for use with MOMo.

AIOct 27, 2025
OntoPret: An Ontology for the Interpretation of Human Behavior

Alexis Ellis, Stacie Severyn, Fjollë Novakazi et al.

As human machine teaming becomes central to paradigms like Industry 5.0, a critical need arises for machines to safely and effectively interpret complex human behaviors. A research gap currently exists between techno centric robotic frameworks, which often lack nuanced models of human behavior, and descriptive behavioral ontologies, which are not designed for real time, collaborative interpretation. This paper addresses this gap by presenting OntoPret, an ontology for the interpretation of human behavior. Grounded in cognitive science and a modular engineering methodology, OntoPret provides a formal, machine processable framework for classifying behaviors, including task deviations and deceptive actions. We demonstrate its adaptability across two distinct use cases manufacturing and gameplay and establish the semantic foundations necessary for advanced reasoning about human intentions.

AIJul 12, 2025
Knowledge Conceptualization Impacts RAG Efficacy

Chris Davis Jaldi, Anmol Saini, Elham Ghiasi et al.

Explainability and interpretability are cornerstones of frontier and next-generation artificial intelligence (AI) systems. This is especially true in recent systems, such as large language models (LLMs), and more broadly, generative AI. On the other hand, adaptability to new domains, contexts, or scenarios is also an important aspect for a successful system. As such, we are particularly interested in how we can merge these two efforts, that is, investigating the design of transferable and interpretable neurosymbolic AI systems. Specifically, we focus on a class of systems referred to as ''Agentic Retrieval-Augmented Generation'' systems, which actively select, interpret, and query knowledge sources in response to natural language prompts. In this paper, we systematically evaluate how different conceptualizations and representations of knowledge, particularly the structure and complexity, impact an AI agent (in this case, an LLM) in effectively querying a triplestore. We report our results, which show that there are impacts from both approaches, and we discuss their impact and implications.

AIMay 3, 2023
An Ontology Design Pattern for Role-Dependent Names

Rushrukh Rayan, Cogan Shimizu, Pascal Hitzler

We present an ontology design pattern for modeling Names as part of Roles, to capture scenarios where an Agent performs different Roles using different Names associated with the different Roles. Examples of an Agent performing a Role using different Names are rather ubiquitous, e.g., authors who write under different pseudonyms, or different legal names for citizens of more than one country. The proposed pattern is a modified merger of a standard Agent Role and a standard Name pattern stub.

AISep 25, 2020
Towards a Modular Ontology for Space Weather Research

Cogan Shimizu, Ryan McGranaghan, Aaron Eberhart et al.

The interactions between the Sun, interplanetary space, near Earth space environment, the Earth's surface, and the power grid are, perhaps unsurprisingly, very complicated. The study of such requires the collaboration between many different organizations spanning the public and private sectors. Thus, an important component of studying space weather is the integration and analysis of heterogeneous information. As such, we have developed a modular ontology to drive the core of the data integration and serve the needs of a highly interdisciplinary community. This paper presents our preliminary modular ontology, for space weather research, as well as demonstrate a method for adaptation to a particular use-case, through the use of existential rules and explicit typing.

AIDec 11, 2019
Completion Reasoning Emulation for the Description Logic EL+

Aaron Eberhart, Monireh Ebrahimi, Lu Zhou et al.

We present a new approach to integrating deep learning with knowledge-based systems that we believe shows promise. Our approach seeks to emulate reasoning structure, which can be inspected part-way through, rather than simply learning reasoner answers, which is typical in many of the black-box systems currently in use. We demonstrate that this idea is feasible by training a long short-term memory (LSTM) artificial neural network to learn EL+ reasoning patterns with two different data sets. We also show that this trained system is resistant to noise by corrupting a percentage of the test data and comparing the reasoner's and LSTM's predictions on corrupt data with correct answers.

AIApr 10, 2019
MODL: A Modular Ontology Design Library

Cogan Shimizu, Quinn Hirt, Pascal Hitzler

Pattern-based, modular ontologies have several beneficial properties that lend themselves to FAIR data practices, especially as it pertains to Interoperability and Reusability. However, developing such ontologies has a high upfront cost, e.g. reusing a pattern is predicated upon being aware of its existence in the first place. Thus, to help overcome these barriers, we have developed MODL: a modular ontology design library. MODL is a curated collection of well-documented ontology design patterns, drawn from a wide variety of interdisciplinary use-cases. In this paper we present MODL as a resource, discuss its use, and provide some examples of its contents.

HCFeb 8, 2018
Caregiver Assessment Using Smart Gaming Technology: A Preliminary Approach

Garrett Goodman, Tanvi Banerjee, William Romine et al.

As pre-diagnostic technologies are becoming increasingly accessible, using them to improve the quality of care available to dementia patients and their caregivers is of increasing interest. Specifically, we aim to develop a tool for non-invasively assessing task performance in a simple gaming application. To address this, we have developed Caregiver Assessment using Smart Gaming Technology (CAST), a mobile application that personalizes a traditional word scramble game. Its core functionality uses a Fuzzy Inference System (FIS) optimized via a Genetic Algorithm (GA) to provide customized performance measures for each user of the system. With CAST, we match the relative level of difficulty of play using the individual's ability to solve the word scramble tasks. We provide an analysis of the preliminary results for determining task difficulty, with respect to our current participant cohort.