43.8LGJun 3
Uncertainty-Aware (Un)Supervised Few-Shot User Adaptation for On-Device Personalized Human Activity RecognitionMaximilian Burzer, Till Riedel, Michael Beigl et al.
Sensor-based Human Activity Recognition (HAR) models often degrade on unseen users due to domain shifts caused by individual movement patterns and sensor placement. Practical wearable HAR systems therefore require personalization methods that are lightweight, applicable whether calibration data is labeled, unlabeled, or unavailable, and robust under limited calibration. We present a gradient-free framework that repurposes pretrained HAR classifiers as Prototypical Networks using using prior prototypes, which preserve zero-shot performance and regularize adaptation. For labeled calibration, we introduce closed-form Bayesian prototype estimation and extend the same principle to unlabeled calibration. With only 3 seconds of calibration data per activity (one shot), supervised adaptation improves macro-F1 by +2.76 to +33.44 percentage points across four datasets, while unsupervised adaptation improves by +0.56 to +32.13 points. Since adaptation requires only closed-form prototype updates, the framework enables efficient and robust on-device personalization of preexisting HAR classifiers.
60.9HCMay 31
pcbGPT: Automatic PCB Schematic Synthesis from Natural Language RequirementsTobias King, Steven Kehrberg, Michael Beigl et al.
Translating natural-language hardware requirements into correct printed circuit board (PCB) schematics remains difficult in embedded, IoT, and wearable development. Designers must choose compatible components, interpret datasheets, add support circuitry, and expose correct interfaces before layout and prototyping can begin, while many such circuits cannot be validated through straightforward simulation. We present pcbGPT, a grounded system for generating editable KiCad schematics from natural-language specifications. pcbGPT represents circuits in a Python DSL and combines tool-augmented synthesis with component-library search, datasheet-grounded design knowledge, execution-based checking, structural and semantic validation, and an interactive web workflow that supports iterative refinement and synchronization with KiCad projects. We evaluate the system on 20 embedded schematic-generation tasks with reference implementations, required components, and interface constraints that enable automatic comparison. The best model reaches overall pass@1 of 0.90 and pass@5 of 1.00; pass@1 is 1.00 on basic and easy tasks, 0.91 on medium tasks, and 0.72 on hard tasks. These results, together with failure analysis, show that pcbGPT can already generate useful, reviewable first-draft schematics for early prototyping, but is not yet reliable enough to replace expert review.
LGOct 27, 2023
MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on MicrocontrollersTobias King, Yexu Zhou, Tobias Röddiger et al.
Designing domain specific neural networks is a time-consuming, error-prone, and expensive task. Neural Architecture Search (NAS) exists to simplify domain-specific model development but there is a gap in the literature for time series classification on microcontrollers. Therefore, we adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs). We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Latency Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs. The resulting system is hardware-aware and can generate neural network architectures that satisfy user-defined limits on the execution latency and peak memory consumption. Our extensive studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models. We also show that our approach is superior in adhering to memory and latency constraints compared to domain-independent NAS baselines such as DARTS.
HCAug 12, 2025Code
WHAR Datasets: An Open Source Library for Wearable Human Activity RecognitionMaximilian Burzer, Tobias King, Till Riedel et al.
The lack of standardization across Wearable Human Activity Recognition (WHAR) datasets limits reproducibility, comparability, and research efficiency. We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling through a standardized data format and a configuration-driven design, enabling reproducible and computationally efficient workflows with minimal manual intervention. The library currently supports 9 widely-used datasets, integrates with PyTorch and TensorFlow, and is easily extensible to new datasets. To demonstrate its utility, we trained two state-of-the-art models, TinyHar and MLP-HAR, on the included datasets, approximately reproducing published results and validating the library's effectiveness for experimentation and benchmarking. Additionally, we evaluated preprocessing performance and observed speedups of up to 3.8x using multiprocessing. We hope this library contributes to more efficient, reproducible, and comparable WHAR research.
LGSep 9, 2025
Feasibility of In-Ear Single-Channel ExG for Wearable Sleep Monitoring in Real-World SettingsPhilipp Lepold, Jonas Leichtle, Tobias Röddiger et al.
Automatic sleep staging typically relies on gold-standard EEG setups, which are accurate but obtrusive and impractical for everyday use outside sleep laboratories. This limits applicability in real-world settings, such as home environments, where continuous, long-term monitoring is needed. Detecting sleep onset is particularly relevant, enabling consumer applications (e.g. automatically pausing media playback when the user falls asleep). Recent research has shown correlations between in-ear EEG and full-scalp EEG for various phenomena, suggesting wearable, in-ear devices could allow unobtrusive sleep monitoring. We investigated the feasibility of using single-channel in-ear electrophysiological (ExG) signals for automatic sleep staging in a wearable device by conducting a sleep study with 11 participants (mean age: 24), using a custom earpiece with a dry eartip electrode (Dätwyler SoftPulse) as a measurement electrode in one ear and a reference in the other. Ground truth sleep stages were obtained from an Apple Watch Ultra, validated for sleep staging. Our system achieved 90.5% accuracy for binary sleep detection (Awake vs. Asleep) and 65.1% accuracy for four-class staging (Awake, REM, Core, Deep) using leave-one-subject-out validation. These findings demonstrate the potential of in-ear electrodes as a low-effort, comfortable approach to sleep monitoring, with applications such as stopping podcasts when users fall asleep.
CLJun 14, 2024
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic GradingTu Anh Dinh, Carlos Mullov, Leonard Bärmann et al.
With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading.