SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
This work addresses the challenge of applying LLMs to sensor data for human activity recognition, which is incremental as it adapts existing LLM capabilities to a new domain.
The paper tackles the problem of enabling Large Language Models (LLMs) to perform human activity recognition from motion sensor time-series data by introducing SensorLLM, a two-stage framework that aligns sensor inputs with language descriptions and tunes for classification, achieving performance that matches or surpasses state-of-the-art methods.
We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor time-series data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where the model aligns sensor inputs with trend descriptions. Special tokens are introduced to mark channel boundaries. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying durations, without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through human-intuitive Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis. Our codes are available at https://github.com/zechenli03/SensorLLM.