David Kim

HC
h-index27
9papers
92citations
Novelty45%
AI Score46

9 Papers

38.9HCMar 25Code
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li et al.

While large language models have accelerated software development through "vibe coding", prototyping intelligent Extended Reality (XR) experiences remains inaccessible due to the friction of complex game engines and low-level sensor integration. To bridge this gap, we contribute XR Blocks, an open-source, modular WebXR framework that abstracts spatial computing complexities into high-level, human-centered primitives. Building upon this foundation, we present Vibe Coding XR, an end-to-end rapid prototyping workflow that leverages LLMs to translate natural language intent directly into functional XR software. Using a web-based interface, creators can transform high-level prompts (e.g., "create a dandelion that reacts to hand") into interactive WebXR applications in under a minute. We provide a preliminary technical evaluation on a pilot dataset (VCXR60) alongside diverse application scenarios highlighting mixed-reality realism, multi-modal interaction, and generative AI integrations. By democratizing spatial software creation, this work empowers practitioners to bypass low-level hurdles and rapidly move from "idea to reality." Code and live demos are available at https://xrblocks.github.io/gem and https://github.com/google/xrblocks.

HCApr 20, 2024Code
Augmented Object Intelligence with XR-Objects

Mustafa Doga Dogan, Eric J. Gonzalez, Karan Ahuja et al.

Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Augmented Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to digital functionalities. Our approach utilizes real-time object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions without the need for object pre-registration. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in contextually relevant ways using object-based context menus. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through various use cases and a user study.

49.6CVMar 21
GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit

Chad Vanderbilt, Gabriele Campanella, Siddharth Singi et al.

Computational biomarkers (CBs) are histopathology-derived patterns extracted from hematoxylin-eosin (H&E) whole-slide images (WSIs) using artificial intelligence (AI) to predict therapeutic response or prognosis. Recently, slide-level multiple-instance learning (MIL) with pathology foundation models (PFMs) has become the standard baseline for CB development. While these methods have improved predictive performance, computational pathology lacks standardized intermediate data formats, provenance tracking, checkpointing conventions, and reproducible evaluation metrics required for clinical-grade deployment. We introduce GOLDMARK (https://artificialintelligencepathology.org), a standardized benchmarking framework built on a curated TCGA cohort with clinically actionable OncoKB level 1-3 biomarker labels. GOLDMARK releases structured intermediate representations, including tile coordinate maps, per-slide feature embeddings from canonical PFMs, quality-control metadata, predefined patient-level splits, trained slide-level models, and evaluation outputs. Models are trained on TCGA and evaluated on an independent MSKCC cohort with reciprocal testing. Across 33 tumor-biomarker tasks, mean AUROC was 0.689 (TCGA) and 0.630 (MSKCC). Restricting to the eight highest-performing tasks yielded mean AUROCs of 0.831 and 0.801, respectively. These tasks correspond to established morphologic-genomic associations (e.g., LGG IDH1, COAD MSI/BRAF, THCA BRAF/NRAS, BLCA FGFR3, UCEC PTEN) and showed the most stable cross-site performance. Differences between canonical encoders were modest relative to task-specific variability. GOLDMARK establishes a shared experimental substrate for computational pathology, enabling reproducible benchmarking and direct comparison of methods across datasets and models.

HCSep 29, 2025Code
XR Blocks: Accelerating Human-centered AI + XR Innovation

David Li, Nels Numan, Xun Qian et al.

We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like JAX and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks strives to provide a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "reducing frictions from idea to reality", thus accelerating rapid prototyping of AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive XR prototype. Site: https://xrblocks.github.io

HCDec 15, 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs

Zhongyi Zhou, Jing Jin, Vrushank Phadnis et al.

Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.

CVJun 5, 2025
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

Neeraj Kumar, Swaraj Nanda, Siddharth Singi et al.

Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs). However, adapting these pretrained PFMs for specific clinical tasks presents considerable challenges, primarily due to the availability of only weak (WSI-level) labels for gigapixel images, necessitating multiple instance learning (MIL) paradigm for effective WSI analysis. This paper proposes a novel approach for single-GPU \textbf{T}ask \textbf{A}daptation of \textbf{PFM}s (TAPFM) that uses vision transformer (\vit) attention for MIL aggregation while optimizing both for feature representations and attention weights. The proposed approach maintains separate computational graphs for MIL aggregator and the PFM to create stable training dynamics that align with downstream task objectives during end-to-end adaptation. Evaluated on mutation prediction tasks for bladder cancer and lung adenocarcinoma across institutional and TCGA cohorts, TAPFM consistently outperforms conventional approaches, with H-Optimus-0 (TAPFM) outperforming the benchmarks. TAPFM effectively handles multi-label classification of actionable mutations as well. Thus, TAPFM makes adaptation of powerful pre-trained PFMs practical on standard hardware for various clinical applications.

CLJun 3, 2025
Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems

Michael E. Garcia-Alcoser, Mobina GhojoghNejad, Fakrul Islam Tushar et al.

Purpose: This study aims to evaluate the effectiveness of large language models (LLMs) in automating disease annotation of CT radiology reports. We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis (CAP) CT reports. Materials and Methods: This retrospective study analyzed 40,833 CT reports from 29,540 patients, with 1,789 CAP reports manually annotated across three organ systems. External validation was conducted using the CT-RATE dataset. Three open-weight LLMs were tested with zero-shot prompting. Performance was evaluated using Cohen's Kappa and micro/macro-averaged F1 scores. Results: In 12,197 Duke CAP reports from 8,854 patients, Llama-3.1 8B and Gemma-3 27B showed the highest agreement ($κ$ median: 0.87). On the manually annotated set, Gemma-3 27B achieved the top macro-F1 (0.82), followed by Llama-3.1 8B (0.79), while the RBA scored lowest (0.64). On the CT-RATE dataset (lungs/pleura only), Llama-3.1 8B performed best (0.91), with Gemma-3 27B close behind (0.89). Performance differences were mainly due to differing labeling practices, especially for lung atelectasis. Conclusion: Lightweight LLMs outperform rule-based methods for CT report annotation and generalize across organ systems with zero-shot prompting. However, binary labels alone cannot capture the full nuance of report language. LLMs can provide a flexible, efficient solution aligned with clinical judgment and user needs.

IVMay 18, 2024
XCAT-3.0: A Comprehensive Library of Personalized Digital Twins Derived from CT Scans

Lavsen Dahal, Mobina Ghojoghnejad, Dhrubajyoti Ghosh et al.

Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VITs. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers accurate assessment of imaging technologies across different patient groups. Traditionally, the more realistic computational phantoms were created by manual segmentation, which is a laborious and time-consuming task, impeding the expansion of phantom libraries. This study presents a framework for creating realistic computational phantoms using a suite of automatic segmentation models and performing three forms of automated quality control on the segmented organ masks. The result is the release of over 2500 new computational phantoms, so-named XCAT3.0 after the ubiquitous XCAT computational construct. This new formation embodies 140 structures and represents a comprehensive approach to detailed anatomical modeling. The developed computational phantoms are formatted in both voxelized and surface mesh formats. The framework is combined with an in-house CT scanner simulator to produce realistic CT images. The framework has the potential to advance virtual imaging trials, facilitating comprehensive and reliable evaluations of medical imaging technologies. Phantoms may be requested at https://cvit.duke.edu/resources/. Code, model weights, and sample CT images are available at https://xcat-3.github.io/.

CVDec 15, 2023
Multiscale Vision Transformer With Deep Clustering-Guided Refinement for Weakly Supervised Object Localization

David Kim, Sinhae Cha, Byeongkeun Kang

This work addresses the task of weakly-supervised object localization. The goal is to learn object localization using only image-level class labels, which are much easier to obtain compared to bounding box annotations. This task is important because it reduces the need for labor-intensive ground-truth annotations. However, methods for object localization trained using weak supervision often suffer from limited accuracy in localization. To address this challenge and enhance localization accuracy, we propose a multiscale object localization transformer (MOLT). It comprises multiple object localization transformers that extract patch embeddings across various scales. Moreover, we introduce a deep clustering-guided refinement method that further enhances localization accuracy by utilizing separately extracted image segments. These segments are obtained by clustering pixels using convolutional neural networks. Finally, we demonstrate the effectiveness of our proposed method by conducting experiments on the publicly available ILSVRC-2012 dataset.