IVJun 1, 2023
Strengths and Weaknesses of 3D Pose Estimation and Inertial Motion Capture System for Movement TherapyShawan Mohammed, Hannah Siebers, Ted Preuß
3D pose estimation offers the opportunity for fast, non-invasive, and accurate motion analysis. This is of special interest also for clinical use. Currently, motion capture systems are used, as they offer robust and precise data acquisition, which is essential in the case of clinical applications. In this study, we investigate the accuracy of the state-of-the-art 3D position estimation approach MeTrabs, compared to the established inertial sensor system MTw Awinda for specific motion exercises. The study uses and provides an evaluation dataset of parallel recordings from 10 subjects during various movement therapy exercises. The information from the Awinda system and the frames for monocular pose estimation are synchronized. For the comparison, clinically relevant parameters for joint angles of ankle, knee, back, and elbow flexion-extension were estimated and evaluated using mean, median, and maximum deviation between the calculated joint angles for the different exercises, camera positions, and clothing items. The results of the analysis indicate that the mean and median deviations can be kept below 5° for some of the studied angles. These joints could be considered for medical applications even considering the maximum deviations of 15°. However, caution should be applied to certain particularly problematic joints. In particular, elbow flexions, which showed high maximum deviations of up to 50° in our analysis. Furthermore, the type of exercise plays a crucial role in the reliable and safe application of the 3D position estimation method. For example, all joint angles showed a significant deterioration in performance during exercises near the ground.
CVSep 2, 2024
An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous DrivingShawan Mohammed, Alp Argun, Nicolas Bonnotte et al.
Our research investigates the challenges Deep Reinforcement Learning (DRL) faces in complex, Partially Observable Markov Decision Processes (POMDP) such as autonomous driving (AD), and proposes a solution for vision-based navigation in these environments. Partial observability reduces RL performance significantly, and this can be mitigated by augmenting sensor information and data fusion to reflect a more Markovian environment. However, this necessitates an increasingly complex perception module, whose training via RL is complicated due to inherent limitations. As the neural network architecture becomes more complex, the reward function's effectiveness as an error signal diminishes since the only source of supervision is the reward, which is often noisy, sparse, and delayed. Task-irrelevant elements in images, such as the sky or certain objects, pose additional complexities. Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations. Then, we train a head network on top of these representations through DRL to learn to control an ego vehicle in the CARLA AD simulator. This study presents a broad investigation of the impact of different learning schemes for offline-training of encoders on the performance of DRL agents in challenging AD tasks. Furthermore, we show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator, in a zero-shot learning fashion. Finally, we explore the impact of various architectural decisions for the RL networks to utilize the transferred representations efficiently. Therefore, in this work, we introduce and validate an optimal way for obtaining suitable representations of the environment, and transferring them to RL networks.