CVDec 20, 2022
Privacy-Protecting Behaviours of Risk Detection in People with Dementia using VideosPratik K. Mishra, Andrea Iaboni, Bing Ye et al. · utoronto
People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent in comparison to normal events. Moreover, analyzing raw videos can also raise privacy concerns. In this paper, we present two novel privacy-protecting video-based anomaly detection approaches to detect behaviours of risks in people with dementia. We either extracted body pose information as skeletons or used semantic segmentation masks to replace multiple humans in the scene with their semantic boundaries. Our work differs from most existing approaches for video anomaly detection that focus on appearance-based features, which can put the privacy of a person at risk and is also susceptible to pixel-based noise, including illumination and viewing direction. We used anonymized videos of normal activities to train customized spatio-temporal convolutional autoencoders and identify behaviours of risk as anomalies. We showed our results on a real-world study conducted in a dementia care unit with patients with dementia, containing approximately 21 hours of normal activities data for training and 9 hours of data containing normal and behaviours of risk events for testing. We compared our approaches with the original RGB videos and obtained a similar area under the receiver operating characteristic curve performance of 0.807 for the skeleton-based approach and 0.823 for the segmentation mask-based approach.
LGFeb 7, 2023
Undersampling and Cumulative Class Re-decision Methods to Improve Detection of Agitation in People with DementiaZhidong Meng, Andrea Iaboni, Bing Ye et al. · utoronto
Agitation is one of the most prevalent symptoms in people with dementia (PwD) that can place themselves and the caregiver's safety at risk. Developing objective agitation detection approaches is important to support health and safety of PwD living in a residential setting. In a previous study, we collected multimodal wearable sensor data from 17 participants for 600 days and developed machine learning models for detecting agitation in one-minute windows. However, there are significant limitations in the dataset, such as imbalance problem and potential imprecise labelsas the occurrence of agitation is much rarer in comparison to the normal behaviours. In this paper, we first implemented different undersampling methods to eliminate the imbalance problem, and came to the conclusion that only 20% of normal behaviour data were adequate to train a competitive agitation detection model. Then, we designed a weighted undersampling method to evaluate the manual labeling mechanism given the ambiguous time interval assumption. After that, the postprocessing method of cumulative class re-decision (CCR) was proposed based on the historical sequential information and continuity characteristic of agitation, improving the decision-making performance for the potential application of agitation detection system. The results showed that a combination of undersampling and CCR improved F1-score and other metrics to varying degrees with less training time and data.
CVAug 28, 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using CamerasPratik K. Mishra, Irene Ballester, Andrea Iaboni et al. · utoronto
The behavioural and psychological symptoms of dementia, such as agitation and aggression, present a significant health and safety risk in residential care settings. Many care facilities have video cameras in place for digital monitoring of public spaces, which can be leveraged to develop an automated behaviours of risk detection system that can alert the staff to enable timely intervention and prevent the situation from escalating. However, one of the challenges in our previous study was the presence of false alarms due to disparate importance of events based on distance. To address this issue, we proposed a novel depth-weighted loss to enforce equivalent importance to the events happening both near and far from the cameras; thus, helping to reduce false alarms. We further propose to utilize the training outliers to determine the anomaly threshold. The data from nine dementia participants across three cameras in a specialized dementia unit were used for training. The proposed approach obtained the best area under receiver operating characteristic curve performance of 0.852, 0.81 and 0.768, respectively, for the three cameras. Ablation analysis was conducted for the individual components of the proposed approach and effect of frame size and frame rate. The performance of the proposed approach was investigated for cross-camera, participant-specific and sex-specific behaviours of risk detection. The proposed approach performed reasonably well in reducing false alarms. This motivates further research to make the system more suitable for deployment in care facilities.
CVAug 22, 2023
Pose2Gait: Extracting Gait Features from Monocular Video of Individuals with DementiaCaroline Malin-Mayor, Vida Adeli, Andrea Sabo et al.
Video-based ambient monitoring of gait for older adults with dementia has the potential to detect negative changes in health and allow clinicians and caregivers to intervene early to prevent falls or hospitalizations. Computer vision-based pose tracking models can process video data automatically and extract joint locations; however, publicly available models are not optimized for gait analysis on older adults or clinical populations. In this work we train a deep neural network to map from a two dimensional pose sequence, extracted from a video of an individual walking down a hallway toward a wall-mounted camera, to a set of three-dimensional spatiotemporal gait features averaged over the walking sequence. The data of individuals with dementia used in this work was captured at two sites using a wall-mounted system to collect the video and depth information used to train and evaluate our model. Our Pose2Gait model is able to extract velocity and step length values from the video that are correlated with the features from the depth camera, with Spearman's correlation coefficients of .83 and .60 respectively, showing that three dimensional spatiotemporal features can be predicted from monocular video. Future work remains to improve the accuracy of other features, such as step time and step width, and test the utility of the predicted values for detecting meaningful changes in gait during longitudinal ambient monitoring.
CVJul 15, 2024
STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton SequencesSoroush Mehraban, Mohammad Javad Rajabi, Andrea Iaboni et al.
Self-supervised pretraining methods with masked prediction demonstrate remarkable within-dataset performance in skeleton-based action recognition. However, we show that, unlike contrastive learning approaches, they do not produce well-separated clusters. Additionally, these methods struggle with generalization in few-shot settings. To address these issues, we propose Self-supervised Tuning for 3D Action Recognition in Skeleton sequences (STARS). Specifically, STARS first uses a masked prediction stage using an encoder-decoder architecture. It then employs nearest-neighbor contrastive learning to partially tune the weights of the encoder, enhancing the formation of semantic clusters for different actions. By tuning the encoder for a few epochs, and without using hand-crafted data augmentations, STARS achieves state-of-the-art self-supervised results in various benchmarks, including NTU-60, NTU-120, and PKU-MMD. In addition, STARS exhibits significantly better results than masked prediction models in few-shot settings, where the model has not seen the actions throughout pretraining. Project page: https://soroushmehraban.github.io/stars/
CVMay 7, 2021Code
Estimating Parkinsonism Severity in Natural Gait Videos of Older Adults with DementiaAndrea Sabo, Sina Mehdizadeh, Andrea Iaboni et al.
Drug-induced parkinsonism affects many older adults with dementia, often causing gait disturbances. New advances in vision-based human pose-estimation have opened possibilities for frequent and unobtrusive analysis of gait in residential settings. This work leverages novel spatial-temporal graph convolutional network (ST-GCN) architectures and training procedures to predict clinical scores of parkinsonism in gait from video of individuals with dementia. We propose a two-stage training approach consisting of a self-supervised pretraining stage that encourages the ST-GCN model to learn about gait patterns before predicting clinical scores in the finetuning stage. The proposed ST-GCN models are evaluated on joint trajectories extracted from video and are compared against traditional (ordinal, linear, random forest) regression models and temporal convolutional network baselines. Three 2D human pose-estimation libraries (OpenPose, Detectron, AlphaPose) and the Microsoft Kinect (2D and 3D) are used to extract joint trajectories of 4787 natural walking bouts from 53 older adults with dementia. A subset of 399 walks from 14 participants is annotated with scores of parkinsonism severity on the gait criteria of the Unified Parkinson's Disease Rating Scale (UPDRS) and the Simpson-Angus Scale (SAS). Our results demonstrate that ST-GCN models operating on 3D joint trajectories extracted from the Kinect consistently outperform all other models and feature sets. Prediction of parkinsonism scores in natural walking bouts of unseen participants remains a challenging task, with the best models achieving macro-averaged F1-scores of 0.53 +/- 0.03 and 0.40 +/- 0.02 for UPDRS-gait and SAS-gait, respectively. Pre-trained model and demo code for this work is available: https://github.com/TaatiTeam/stgcn_parkinsonism_prediction.
CVMar 28, 2025
GAITGen: Disentangled Motion-Pathology Impaired Gait Generative Model -- Bringing Motion Generation to the Clinical DomainVida Adeli, Soroush Mehraban, Majid Mirmehdi et al.
Gait analysis is crucial for the diagnosis and monitoring of movement disorders like Parkinson's Disease. While computer vision models have shown potential for objectively evaluating parkinsonian gait, their effectiveness is limited by scarce clinical datasets and the challenge of collecting large and well-labelled data, impacting model accuracy and risk of bias. To address these gaps, we propose GAITGen, a novel framework that generates realistic gait sequences conditioned on specified pathology severity levels. GAITGen employs a Conditional Residual Vector Quantized Variational Autoencoder to learn disentangled representations of motion dynamics and pathology-specific factors, coupled with Mask and Residual Transformers for conditioned sequence generation. GAITGen generates realistic, diverse gait sequences across severity levels, enriching datasets and enabling large-scale model training in parkinsonian gait analysis. Experiments on our new PD-GaM (real) dataset demonstrate that GAITGen outperforms adapted state-of-the-art models in both reconstruction fidelity and generation quality, accurately capturing critical pathology-specific gait features. A clinical user study confirms the realism and clinical relevance of our generated sequences. Moreover, incorporating GAITGen-generated data into downstream tasks improves parkinsonian gait severity estimation, highlighting its potential for advancing clinical gait analysis.
CVOct 13, 2025
FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion DecodingSoroush Mehraban, Andrea Iaboni, Babak Taati
Recent transformer-based models for 3D Human Mesh Recovery (HMR) have achieved strong performance but often suffer from high computational cost and complexity due to deep transformer architectures and redundant tokens. In this paper, we introduce two HMR-specific merging strategies: Error-Constrained Layer Merging (ECLM) and Mask-guided Token Merging (Mask-ToMe). ECLM selectively merges transformer layers that have minimal impact on the Mean Per Joint Position Error (MPJPE), while Mask-ToMe focuses on merging background tokens that contribute little to the final prediction. To further address the potential performance drop caused by merging, we propose a diffusion-based decoder that incorporates temporal context and leverages pose priors learned from large-scale motion capture datasets. Experiments across multiple benchmarks demonstrate that our method achieves up to 2.3x speed-up while slightly improving performance over the baseline.
CVOct 5, 2025
CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait AssessmentVida Adeli, Ivan Klabucar, Javad Rajabi et al.
Objective gait assessment in Parkinson's Disease (PD) is limited by the absence of large, diverse, and clinically annotated motion datasets. We introduce CARE-PD, the largest publicly available archive of 3D mesh gait data for PD, and the first multi-site collection spanning 9 cohorts from 8 clinical centers. All recordings (RGB video or motion capture) are converted into anonymized SMPL meshes via a harmonized preprocessing pipeline. CARE-PD supports two key benchmarks: supervised clinical score prediction (estimating Unified Parkinson's Disease Rating Scale, UPDRS, gait scores) and unsupervised motion pretext tasks (2D-to-3D keypoint lifting and full-body 3D reconstruction). Clinical prediction is evaluated under four generalization protocols: within-dataset, cross-dataset, leave-one-dataset-out, and multi-dataset in-domain adaptation. To assess clinical relevance, we compare state-of-the-art motion encoders with a traditional gait-feature baseline, finding that encoders consistently outperform handcrafted features. Pretraining on CARE-PD reduces MPJPE (from 60.8mm to 7.5mm) and boosts PD severity macro-F1 by 17 percentage points, underscoring the value of clinically curated, diverse training data. CARE-PD and all benchmark code are released for non-commercial research at https://neurips2025.care-pd.ca/.
LGApr 15, 2021
Tracking agitation in people living with dementia in a care environmentShehroz S. Khan, Thaejaesh Sooriyakumaran, Katherine Rich et al.
Agitation is a symptom that communicates distress in people living with dementia (PwD), and that can place them and others at risk. In a long term care (LTC) environment, care staff track and document these symptoms as a way to detect when there has been a change in resident status to assess risk, and to monitor for response to interventions. However, this documentation can be time-consuming, and due to staffing constraints, episodes of agitation may go unobserved. This brings into question the reliability of these assessments, and presents an opportunity for technology to help track and monitor behavioural symptoms in dementia. In this paper, we present the outcomes of a 2 year real-world study performed in a dementia unit, where a multi-modal wearable device was worn by $20$ PwD. In line with a commonly used clinical documentation tool, this large multi-modal time-series data was analyzed to track the presence of episodes of agitation in 8-hour nursing shifts. The development of a baseline classification model (AUC=0.717) on this dataset and subsequent improvement (AUC= 0.779) lays the groundwork for automating the process of annotating agitation events in nursing charts.