Tyler Sorensen

22.7HCMar 27

Unlocking Open-Player-Modeling-enhanced Game-Based Learning: The Open Player Socially Analytical Intelligence Architecture

Zhiyu Lin, Boyd Fox, Devon Mckee et al. · gatech

Game-Based Learning (GBL) is a learner-engaging pedagogical methodology, yet adapting games to heterogeneous learners requires transparent, real-time Open Player Models (OPMs). We contribute to the community Open Player Socially Analytical Intelligence (OPSAI), an architecture implementing OPM beyond conceptual frameworks and validated in a GBL application. It decouples gameplay telemetry and analysis from the game engine and automatically derives pedagogically actionable insights, supporting the transparency of computational player models while making them accessible to players. OPSAI comprises three logical layers: a Frontend that both provides the GBL experience and collects information needed for analytics; a stateless Backend that hosts transparent analytics services producing reflective prompts, recommendations, and visualization guides; and a two-tier Log Storage that balances heavy raw gameplay data with lightweight reference indices for low-latency queries. By feeding analytics outputs back into the game interface, OPSAI closes the feedback loop between play and learning, empowering teachers, researchers, and learners alike. We further showcase OPSAI with a full deployment on the Parallel GBL environment, featuring live play traces, peer comparisons, and personalized suggestions, demonstrating a reusable blueprint for future educational games.

11.3DCMay 20

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

Reese Levine, Rithik Sharma, Nikhil Jain et al.

Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama.cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models and four model weight formats. We compare LlamaWeb against existing browser-based LLM frameworks and find that LlamaWeb requires 29-33% less memory across several combinations of device, browser, and operating system. We also evaluate LlamaWeb's performance against these frameworks and find that it increases decode throughput by 45-69% across four GPUs from separate vendors. In addition, we compare LlamaWeb's performance against other llama.cpp backends, where it is competitive with and even beats vendor-specific backend performance on some devices.

Tyler Sorensen

2 Papers