CVLGApr 2, 2025

Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding

arXiv:2504.02878v19 citationsh-index: 5Proceedings of the 2nd International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things
Originality Incremental advance
AI Analysis

This work addresses fine-grained activity recognition for applications like gesture control, though it is incremental as it builds on existing LLM methods.

The paper tackled the problem of fine-grained human activity recognition using IMUs, where pretrained LLMs performed poorly, and achieved a 129x improvement on 2D data and 78% accuracy on 3D word recognition.

Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this work, we first bridge this gap for flat-surface writing scenarios: by fine-tuning LLMs with a self-collected dataset and few-shot learning, we achieved up to a 129x improvement on 2D data. To extend this to 3D scenarios, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving the spatiotemporal information for robust letter prediction. Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for fine-grained HAR.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes