Jarek Nabrzyski

CR
h-index23
4papers
40citations
Novelty30%
AI Score35

4 Papers

CLFeb 12Code
ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias

Guangxin Zhao, Jiahao Zheng, Malaz Boustani et al.

Large language models (LLMs) have shown great potential for healthcare applications. However, existing evaluation benchmarks provide minimal coverage of Alzheimer's Disease and Related Dementias (ADRD). To address this gap, we introduce ADRD-Bench, the first ADRD-specific benchmark dataset designed for rigorous evaluation of LLMs. ADRD-Bench has two components: 1) ADRD Unified QA, a synthesis of 1,352 questions consolidated from seven established medical benchmarks, providing a unified assessment of clinical knowledge; and 2) ADRD Caregiving QA, a novel set of 149 questions derived from the Aging Brain Care (ABC) program, a widely used, evidence-based brain health management program. Guided by a program with national expertise in comprehensive ADRD care, this new set was designed to mitigate the lack of practical caregiving context in existing benchmarks. We evaluated 33 state-of-the-art LLMs on the proposed ADRD-Bench. Results showed that the accuracy of open-weight general models ranged from 0.63 to 0.93 (mean: 0.78; std: 0.09). The accuracy of open-weight medical models ranged from 0.48 to 0.93 (mean: 0.82; std: 0.13). The accuracy of closed-source general models ranged from 0.83 to 0.91 (mean: 0.89; std: 0.03). While top-tier models achieved high accuracies (>0.9), case studies revealed that inconsistent reasoning quality and stability limit their reliability, highlighting a critical need for domain-specific improvement to enhance LLMs' knowledge and reasoning grounded in daily caregiving data. The entire dataset is available at https://github.com/IIRL-ND/ADRD-Bench.

LGMay 13, 2021
Providing Assurance and Scrutability on Shared Data and Machine Learning Models with Verifiable Credentials

Iain Barclay, Alun Preece, Ian Taylor et al.

Adopting shared data resources requires scientists to place trust in the originators of the data. When shared data is later used in the development of artificial intelligence (AI) systems or machine learning (ML) models, the trust lineage extends to the users of the system, typically practitioners in fields such as healthcare and finance. Practitioners rely on AI developers to have used relevant, trustworthy data, but may have limited insight and recourse. This paper introduces a software architecture and implementation of a system based on design patterns from the field of self-sovereign identity. Scientists can issue signed credentials attesting to qualities of their data resources. Data contributions to ML models are recorded in a bill of materials (BOM), which is stored with the model as a verifiable credential. The BOM provides a traceable record of the supply chain for an AI system, which facilitates on-going scrutiny of the qualities of the contributing components. The verified BOM, and its linkage to certified data qualities, is used in the AI Scrutineer, a web-based tool designed to offer practitioners insight into ML model constituents and highlight any problems with adopted datasets, should they be found to have biased data or be otherwise discredited.

CRApr 6, 2020
Certifying Provenance of Scientific Datasets with Self-sovereign Identity and Verifiable Credentials

Iain Barclay, Swapna Radha, Alun Preece et al.

In order to increase the value of scientific datasets and improve research outcomes, it is important that only trustworthy data is used. This paper presents mechanisms by which scientists and the organisations they represent can certify the authenticity of characteristics and provenance of any datasets they publish so that secondary users can inspect and gain confidence in the qualities of data they source. By drawing on data models and protocols used to provide self-sovereign ownership of identity and personal data to individuals, we conclude that providing self-sovereignty to data assets offers a viable approach for institutions to certify qualities of their datasets in a cryptography secure manner, and enables secondary data users to efficiently perform verification of the authenticity of such certifications. By building upon emerging standards for decentralized identification and cryptographically verifiable credentials, we envisage an infrastructure of tools being developed to foster adoption of metadata certification schemes, and improving the quality of information provided in support of shared data assets.

CRSep 22, 2019
Techniques and Applications for Crawling, Ingesting and Analyzing Blockchain Data

Evan Brinckman, Andrey Kuehlkamp, Jarek Nabrzyski et al.

As the public Ethereum network surpasses half a billion transactions and enterprise Blockchain systems becoming highly capable of meeting the demands of global deployments, production Blockchain applications are fast becoming commonplace across a diverse range of business and scientific verticals. In this paper, we reflect on work we have been conducting recently surrounding the ingestion, retrieval and analysis of Blockchain data. We describe the scaling and semantic challenges when extracting Blockchain data in a way that preserves the original metadata of each transaction by cross referencing the Smart Contract interface with the on-chain data. We then discuss a scientific use case in the area of Scientific workflows by describing how we can harvest data from tasks and dependencies in a generic way. We then discuss how crawled public blockchain data can be analyzed using two unsupervised machine learning algorithms, which are designed to identify outlier accounts or smart contracts in the system. We compare and contrast the two machine learning methods and cross correlate with public Websites to illustrate the effectiveness such approaches.