SEJul 7, 2023Code
Exploring and Characterizing Large Language Models For Embedded System Development and DebuggingZachary Englhardt, Richard Li, Dilini Nissanka et al. · uw
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we develop an extensible, open source hardware-in-the-loop framework to systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their capabilities and limitations for embedded system development. We observe through our study that even when these tools fail to produce working code, they consistently generate helpful reasoning about embedded design tasks. We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems. Our evaluation platform for verifying LLM generated programs uses sensor actuator pairs for physical evaluation. We compare all three models with N=450 experiments and find surprisingly that GPT-4 especially shows an exceptional level of cross-domain understanding and reasoning, in some cases generating fully correct programs from a single prompt. In N=50 trials, GPT-4 produces functional I2C interfaces 66% of the time. GPT-4 also produces register-level drivers, code for LoRa communication, and context-specific power optimizations for an nRF52 program resulting in over 740x current reduction to 12.2uA. We also characterize the models' limitations to develop a generalizable human-AI workflow for using LLMs in embedded system development. We evaluate our workflow with 15 users including novice and expert programmers. We find that our workflow improves productivity for all users and increases the success rate for building a LoRa environmental sensor from 25% to 100%, including for users with zero hardware or C/C++ experience.
98.0AIMay 21
Towards a General Intelligence and Interface for Wearable Health DataGirish Narayanswamy, Maxwell A. Xu, A. Ali Heydari et al.
While ubiquitous wearable sensors capture a wealth of behavioral and physiological information, effectively transforming these signals into personalized health insights is challenging. Specifically, converting low-level sensor data into representations capable of characterizing higher-level states is difficult due to high phenotypic diversity and variation in individual baseline health, physiology, and lifestyle factors. Moreover, collecting wearable data paired with health outcome annotations is laborious and expensive, and retrospective annotation remains practically unfeasible, contributing to a scarcity of data with high-quality labels. To overcome these limitations, we propose a foundation model for wearable health that is pretrained on more than one trillion minutes of unlabeled sensor signals drawn from a large cohort of five million participants. We demonstrate that the joint scaling of model capacity and pretraining data volume leads to systematic improvements in performance, as evaluated on a diverse set of 35 health prediction tasks, spanning cardiovascular, metabolic, sleep, and mental health, as well as lifestyle choices and demographic factors. We find that this population scale representation unlocks label-efficient few-shot learning and generative capabilities for robust daily metric estimation. To further leverage this learned representation, we deploy a classroom of LLM agents to autonomously search the space of downstream predictive heads built on the model embeddings, showing broad performance improvements that increase with LLM model capacity. Finally, we show how integrating these downstream predictors into a Personal Health Agent can support model responses that are more relevant, contextually aware, and safe, and we validate this via 1,860 ratings from a cohort of clinicians.
92.3AIMay 5
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom AssessmentJoseph Breda, Fadi Yousif, Beszel Hawkins et al.
Language models excel at diagnostic assessments on currated medical case-studies and vignettes, performing on par with, or better than, clinical professionals. However, existing studies focus on complex scenarios with rich context making it difficult to draw conclusions about how these systems perform for patients reporting symptoms in everyday life. We deployed SymptomAI, a set of conversational AI agents for end-to-end patient interviewing and differential diagnosis (DDx), via the Fitbit app in a study that randomized participants (N=13,917) to interact with five AI agents. This corpus captures diverse communication and a realistic distribution of illnesses from a real world population. A subset of 1,228 participants reported a clinician-provided diagnosis, and 517 of these were further evaluated by a panel of clinicians during over 250 hours of annotation. SymptomAI DDx were significantly more accurate (OR = 2.47, p < 0.001) than those from independent clinicians given the same dialogue in a blinded randomized comparison. Moreover, agentic strategies which conduct a dedicated symptom interview that elicit additional symptom information before providing a diagnosis, perform substantially better than baseline, user-guided conversations (p < 0.001). An auxiliary analysis on 1,509 conversations from a general US population panel validated that these results generalize beyond wearable device users. We used SymptomAI diagnoses as labels for all 13,917 participants to analyze over 500,000 days of wearable metrics across nearly 400 unique conditions. We identified strong associations between acute infections and physiological shifts (e.g., OR > 7 for influenza). While limited by self-reported ground truth, these results demonstrate the benefits of a dedicated and complete symptom interview compared to a user-guided symptom discussion, which is the default of most consumer LLMs.
HCMay 27, 2021
Intuitive and Ubiquitous Fever Monitoring Using Smartphones and SmartwatchesJoseph Breda, Shwetak Patel
Inside all smart devices, such as smartphones or smartwatches, there are thermally sensitive resistors known as thermistors which are used to monitor the temperature of the device. These thermistors are sensitive to temperature changes near their location on-device. While they are designed to measure the temperature of the device components such as the battery, they can also sense changes in the temperature of the ambient environment or thermal entities in contact with the device. We have developed a model to estimate core body temperature from signals sensed by these thermistors during a user interaction in which the user places the capacitive touchscreen of a smart device against a thermal site on their body such as their forehead. During the interaction, the device logs the temperature sensed by the thermistors as well as the raw capacitance seen by the touch screen to capture features describing the rate of heat transfer from the body to the device and device-to-skin contact respectively. These temperature and contact features are then used to model the rate of heat transferred from the user's body to the device and thus core-body temperature of the user for ubiquitous and accessible fever monitoring using only a smart device. We validate this system in a lab environment on a simulated skin-like heat source with a temperature estimate mean absolute error of 0.743$^{\circ}$F (roughly 0.4$^{\circ}$C) and limit of agreement of $\pm2.374^{\circ}$F (roughly 1.3$^{\circ}$C) which is comparable to some off-the-shelf peripheral and tympanic thermometers. We found a Pearson's correlation $R^2$ of 0.837 between ground truth temperature and temperature estimated by our system. We also deploy this system in an ongoing clinical study on a population of 7 participants in a clinical environment to show the similarity between simulated and clinical trials.