LGFeb 16, 2023
cGAN-Based High Dimensional IMU Sensor Data Generation for Enhanced Human Activity Recognition in Therapeutic ActivitiesMohammad Mohammadzadeh, Ali Ghadami, Alireza Taheri et al.
Human activity recognition is a core technology for applications such as rehabilitation, health monitoring, and human-computer interactions. Wearable devices, especially IMU sensors, provide rich features of human movements at a reasonable cost, which can be leveraged in activity recognition. Developing a robust classifier for activity recognition has always been of interest to researchers. One major problem is that there is usually a deficit of training data, which makes developing deep classifiers difficult and sometimes impossible. In this work, a novel GAN network called TheraGAN was developed to generate IMU signals associated with rehabilitation activities. The generated signal comprises data from a 6-channel IMU, i.e., angular velocities and linear accelerations. Also, introducing simple activities simplified the generation process for activities of varying lengths. To evaluate the generated signals, several qualitative and quantitative studies were conducted, including perceptual similarity analysis, comparing manually extracted features to those from real data, visual inspection, and an investigation into how the generated data affects the performance of three deep classifiers trained on the generated and real data. The results showed that the generated signals closely mimicked the real signals, and adding generated data resulted in a significant improvement in the performance of all tested networks. Among the tested networks, the LSTM classifier demonstrated the most significant improvement, achieving a 13.27% boost, effectively addressing the challenge of data scarcity. This shows the validity of the generated data as well as TheraGAN as a tool to build more robust classifiers in case of imbalanced and insufficient data problems.
HCFeb 11
Modeling of ASD/TD Children's Behaviors in Interaction with a Virtual Social Robot During a Music Education Program Using Deep Neural NetworksArmin Tandiseh, Morteza Memari, Alireza Taheri
This research aimed to develop an intelligent system to evaluate performance and extract behavioral models for children with ASD and neurotypical (TD) children by interacting with a virtual social robot in a music education program using deep neural networks. The system has two main features: 1) it distinguishes between neurotypical children and those with ASD based on their behavior, and 2) generates behaviors resembling those of neurotypical or ASD children in similar situations using deep learning. Intelligent systems that identify complex patterns and simulate behavior can aid in diagnosis, therapist training, and understanding the disorder. Using data from a previous study at the Social and Cognitive Robotics Laboratory of Sharif University of Technology (including the usable data of 9 ASD and 21 TD participants), the system achieved an accuracy of 81% and sensitivity of 96% in distinguishing neurotypical children from those with ASD using both impact data and motion signals. A transformer-based network was designed to reproduce children's behaviors. Experts in the field struggled to differentiate real behaviors from reproduced ones, with an accuracy of 53.5% and agreement of 68%, indicating the model's success in simulating realistic behaviors.
CVNov 25, 2025
Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching ActivitiesSeyede Niloofar Hosseini, Ali Mojibi, Mahdi Mohseni et al.
This study aimed to explore the application of deep neural networks for whole-body human posture prediction during dynamic load-reaching activities. Two time-series models were trained using bidirectional long short-term memory (BLSTM) and transformer architectures. The dataset consisted of 3D full-body plug-in gait dynamic coordinates from 20 normal-weight healthy male individuals each performing 204 load-reaching tasks from different load positions while adapting various lifting and handling techniques. The model inputs consisted of the 3D position of the hand-load position, lifting (stoop, full-squat and semi-squat) and handling (one- and two-handed) techniques, body weight and height, and the 3D coordinate data of the body posture from the first 25% of the task duration. These inputs were used by the models to predict body coordinates during the remaining 75% of the task period. Moreover, a novel method was proposed to improve the accuracy of the previous and present posture prediction networks by enforcing constant body segment lengths through the optimization of a new cost function. The results indicated that the new cost function decreased the prediction error of the models by approximately 8% and 21% for the arm and leg models, respectively. We indicated that utilizing the transformer architecture, with a root-mean-square-error of 47.0 mm, exhibited ~58% more accurate long-term performance than the BLSTM-based model. This study merits the use of neural networks that capture time series dependencies in 3D motion frames, providing a unique approach for understanding and predict motion dynamics during manual material handling activities.
AIMay 26, 2025
LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control EngineerRasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah et al.
This study presents the LLM-Agent-Controller, a multi-agent large language model (LLM) system developed to address a wide range of problems in control engineering (Control Theory). The system integrates a central controller agent with multiple specialized auxiliary agents, responsible for tasks such as controller design, model representation, control analysis, time-domain response, and simulation. A supervisor oversees high-level decision-making and workflow coordination, enhancing the system's reliability and efficiency. The LLM-Agent-Controller incorporates advanced capabilities, including Retrieval-Augmented Generation (RAG), Chain-of-Thought reasoning, self-criticism and correction, efficient memory handling, and user-friendly natural language communication. It is designed to function without requiring users to have prior knowledge of Control Theory, enabling them to input problems in plain language and receive complete, real-time solutions. To evaluate the system, we propose new performance metrics assessing both individual agents and the system as a whole. We test five categories of Control Theory problems and benchmark performance across three advanced LLMs. Additionally, we conduct a comprehensive qualitative conversational analysis covering all key services. Results show that the LLM-Agent-Controller successfully solved 83% of general tasks, with individual agents achieving an average success rate of 87%. Performance improved with more advanced LLMs. This research demonstrates the potential of multi-agent LLM architectures to solve complex, domain-specific problems. By integrating specialized agents, supervisory control, and advanced reasoning, the LLM-Agent-Controller offers a scalable, robust, and accessible solution framework that can be extended to various technical domains.
CVMar 16, 2025
ISLR101: an Iranian Word-Level Sign Language Recognition DatasetHossein Ranjbar, Alireza Taheri
Sign language recognition involves modeling complex multichannel information, such as hand shapes and movements while relying on sufficient sign language-specific data. However, sign languages are often under-resourced, posing a significant challenge for research and development in this field. To address this gap, we introduce ISLR101, the first publicly available Iranian Sign Language dataset for isolated sign language recognition. This comprehensive dataset includes 4,614 videos covering 101 distinct signs, recorded by 10 different signers (3 deaf individuals, 2 sign language interpreters, and 5 L2 learners) against varied backgrounds, with a resolution of 800x600 pixels and a frame rate of 25 frames per second. It also includes skeleton pose information extracted using OpenPose. We establish both a visual appearance-based and a skeleton-based framework as baseline models, thoroughly training and evaluating them on ISLR101. These models achieve 97.01% and 94.02% accuracy on the test set, respectively. Additionally, we publish the train, validation, and test splits to facilitate fair comparisons.
CLJun 27, 2024
A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language RecognitionAli Ghadami, Alireza Taheri, Ali Meghdari
Sign language is an essential means of communication for millions of people around the world and serves as their primary language. However, most communication tools are developed for spoken and written languages which can cause problems and difficulties for the deaf and hard of hearing community. By developing a sign language recognition system, we can bridge this communication gap and enable people who use sign language as their main form of expression to better communicate with people and their surroundings. This recognition system increases the quality of health services, improves public services, and creates equal opportunities for the deaf community. This research aims to recognize Iranian Sign Language words with the help of the latest deep learning tools such as transformers. The dataset used includes 101 Iranian Sign Language words frequently used in academic environments such as universities. The network used is a combination of early fusion and late fusion transformer encoder-based networks optimized with the help of genetic algorithm. The selected features to train this network include hands and lips key points, and the distance and angle between hands extracted from the sign videos. Also, in addition to the training model for the classes, the embedding vectors of words are used as multi-task learning to have smoother and more efficient training. This model was also tested on sentences generated from our word dataset using a windowing technique for sentence translation. Finally, the sign language training software that provides real-time feedback to users with the help of the developed model, which has 90.2% accuracy on test data, was introduced, and in a survey, the effectiveness and efficiency of this type of sign language learning software and the impact of feedback were investigated.
CVJun 26, 2024
Continuous Sign Language Recognition Using Intra-inter Gloss AttentionHossein Ranjbar, Alireza Taheri
Many continuous sign language recognition (CSLR) studies adopt transformer-based architectures for sequence modeling due to their powerful capacity for capturing global contexts. Nevertheless, vanilla self-attention, which serves as the core module of the transformer, calculates a weighted average over all time steps; therefore, the local temporal semantics of sign videos may not be fully exploited. In this study, we introduce a novel module in sign language recognition studies, called intra-inter gloss attention module, to leverage the relationships among frames within glosses and the semantic and grammatical dependencies between glosses in the video. In the intra-gloss attention module, the video is divided into equally sized chunks and a self-attention mechanism is applied within each chunk. This localized self-attention significantly reduces complexity and eliminates noise introduced by considering non-relative frames. In the inter-gloss attention module, we first aggregate the chunk-level features within each gloss chunk by average pooling along the temporal dimension. Subsequently, multi-head self-attention is applied to all chunk-level features. Given the non-significance of the signer-environment interaction, we utilize segmentation to remove the background of the videos. This enables the proposed model to direct its focus toward the signer. Experimental results on the PHOENIX-2014 benchmark dataset demonstrate that our method can effectively extract sign language features in an end-to-end manner without any prior knowledge, improve the accuracy of CSLR, and achieve the word error rate (WER) of 20.4 on the test set which is a competitive result compare to the state-of-the-art which uses additional supervisions.
RODec 12, 2023
Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced SociabilityAli Ghadami, Mohammadreza Taghimohammadi, Mohammad Mohammadzadeh et al.
Robots' acceptability among humans and their sociability can be significantly enhanced by incorporating human-like reactions. Humans can react to environmental events very quickly and without thinking. An instance where humans show natural reactions is when they encounter a sudden and loud sound that startles or frightens them. During such moments, individuals may instinctively move their hands, turn toward the origin of the sound, and try to determine the event's cause. This inherent behavior motivated us to explore this less-studied part of social robotics. In this work, a multi-modal system composed of an action generator, sound classifier, and YOLO object detector was designed to sense the environment and, in the presence of sudden loud sounds, show natural human fear reactions; and finally, locate the fear-causing sound source in the environment. These valid generated motions and inferences could imitate intrinsic human reactions and enhance the sociability of robots. For motion generation, a model based on LSTM and MDN networks was proposed to synthesize various motions. Also, in the case of sound detection, a transfer learning model was preferred that used the spectrogram of the sound signals as its input. After developing individual models for sound detection, motion generation, and image recognition, they were integrated into a comprehensive "fear" module implemented on the NAO robot. Finally, the fear module was tested in practical application and two groups of experts and non-experts (in the robotics area) filled out a questionnaire to evaluate the performance of the robot. We indicated that the proposed module could convince the participants that the Nao robot acts and reasons like a human when a sudden and loud sound is in the robot's peripheral environment, and additionally showed that non-experts have higher expectations about social robots and their performance.