CVOct 30, 2025
StrengthSense: A Dataset of IMU Signals Capturing Everyday Strength-Demanding ActivitiesZeyu Yang, Clayton Souza Leite, Yu Xiao
Tracking strength-demanding activities with wearable sensors like IMUs is crucial for monitoring muscular strength, endurance, and power. However, there is a lack of comprehensive datasets capturing these activities. To fill this gap, we introduce \textit{StrengthSense}, an open dataset that encompasses IMU signals capturing 11 strength-demanding activities, such as sit-to-stand, climbing stairs, and mopping. For comparative purposes, the dataset also includes 2 non-strength demanding activities. The dataset was collected from 29 healthy subjects utilizing 10 IMUs placed on limbs and the torso, and was annotated using video recordings as references. This paper provides a comprehensive overview of the data collection, pre-processing, and technical validation. We conducted a comparative analysis between the joint angles estimated by IMUs and those directly extracted from video to verify the accuracy and reliability of the sensor data. Researchers and developers can utilize \textit{StrengthSense} to advance the development of human activity recognition algorithms, create fitness and health monitoring applications, and more.
LGOct 17, 2024
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and ChallengesClayton Souza Leite, Henry Mauranen, Aziza Zhanabatyrova et al.
Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR). Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms. However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices. Our extensive investigation into various implementations of transformer-based versus non-transformer-based HAR using wearable sensors, encompassing more than 500 experiments, corroborates these concerns. We observe that transformer-based solutions pose higher computational demands, consistently yield inferior performance, and experience significant performance degradation when quantized to accommodate resource-constrained devices. Additionally, transformers demonstrate lower robustness to adversarial attacks, posing a potential threat to user trust in HAR.
CVSep 24, 2021
Automatic Map Update Using Dashcam VideosAziza Zhanabatyrova, Clayton Souza Leite, Yu Xiao
Autonomous driving requires 3D maps that provide accurate and up-to-date information about semantic landmarks. Due to the wider availability and lower cost of cameras compared with laser scanners, vision-based mapping solutions, especially the ones using crowdsourced visual data, have attracted much attention from academia and industry. However, previous works have mainly focused on creating 3D point clouds, leaving automatic change detection as open issues. We propose in this paper a pipeline for initiating and updating 3D maps with dashcam videos, with a focus on automatic change detection based on comparison of metadata (e.g., the types and locations of traffic signs). To improve the performance of metadata generation, which depends on the accuracy of 3D object detection and localization, we introduce a novel deep learning-based pixel-wise 3D localization algorithm. The algorithm, trained directly with SfM point cloud data, can locate objects detected from 2D images in a 3D space with high accuracy by estimating not only depth from monocular images but also lateral and height distances. In addition, we also propose a point clustering and thresholding algorithm to improve the robustness of the system to errors. We have performed experiments on two distinct areas - a campus and a residential area - with different types of cameras, lighting, and weather conditions. The changes were detected with 85% and 100% accuracy in the campus and residential areas, respectively. The errors in the campus area were mainly due to traffic signs seen from a far distance to the vehicle and intended for pedestrians and cyclists only. We also conducted cause analysis of the detection and localization errors to measure the impact from the performance of the background technology in use.