Mo Han

RO
8papers
146citations
Novelty46%
AI Score24

8 Papers

ROApr 19, 2021
Inference of Upcoming Human Grasp Using EMG During Reach-to-Grasp Movement

Mo Han, Mehrshad Zandigohar, Sezen Yagmur Gunay et al.

Electromyography (EMG) data has been extensively adopted as an intuitive interface for instructing human-robot collaboration. A major challenge of the real-time detection of human grasp intent is the identification of dynamic EMG from hand movements. Previous studies mainly implemented steady-state EMG classification with a small number of grasp patterns on dynamic situations, which are insufficient to generate differentiated control regarding the muscular activity variation in practice. In order to better detect dynamic movements, more EMG variability could be integrated into the model. However, only limited research were concentrated on such detection of dynamic grasp motions, and most existing assessments on non-static EMG classification either require supervised ground-truth timestamps of the movement status, or only contain limited kinematic variations. In this study, we propose a framework for classifying dynamic EMG signals into gestures, and examine the impact of different movement phases, using an unsupervised method to segment and label the action transitions. We collected and utilized data from large gesture vocabularies with multiple dynamic actions to encode the transitions from one grasp intent to another based on common sequences of the grasp movements. The classifier for identifying the gesture label was constructed afterwards based on the dynamic EMG signal, with no supervised annotation of kinematic movements required. Finally, we evaluated the performances of several training strategies using EMG data from different movement phases, and explored the information revealed from each phase. All experiments were evaluated in a real-time style with the performance transitions over time presented.

ROApr 8, 2021
Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif et al.

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

CVMar 8, 2021
From Hand-Perspective Visual Information to Grasp Type Probabilities: Deep Learning via Ranking Labels

Mo Han, Sezen Ya{ğ}mur Günay, İlkay Yıldız et al.

Limb deficiency severely affects the daily lives of amputees and drives efforts to provide functional robotic prosthetic hands to compensate this deprivation. Convolutional neural network-based computer vision control of the prosthetic hand has received increased attention as a method to replace or complement physiological signals due to its reliability by training visual information to predict the hand gesture. Mounting a camera into the palm of a prosthetic hand is proved to be a promising approach to collect visual data. However, the grasp type labelled from the eye and hand perspective may differ as object shapes are not always symmetric. Thus, to represent this difference in a realistic way, we employed a dataset containing synchronous images from eye- and hand- view, where the hand-perspective images are used for training while the eye-view images are only for manual labelling. Electromyogram (EMG) activity and movement kinematics data from the upper arm are also collected for multi-modal information fusion in future work. Moreover, in order to include human-in-the-loop control and combine the computer vision with physiological signal inputs, instead of making absolute positive or negative predictions, we build a novel probabilistic classifier according to the Plackett-Luce model. To predict the probability distribution over grasps, we exploit the statistical model over label rankings to solve the permutation domain problems via a maximum likelihood estimation, utilizing the manually ranked lists of grasps as a new form of label. We indicate that the proposed model is applicable to the most popular and productive convolutional neural network frameworks.

ROMar 8, 2021
HANDS: A Multimodal Dataset for Modeling Towards Human Grasp Intent Inference in Prosthetic Hands

Mo Han, Sezen Ya{ğ}mur Günay, Gunar Schirner et al.

Upper limb and hand functionality is critical to many activities of daily living and the amputation of one can lead to significant functionality loss for individuals. From this perspective, advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user, but more importantly from the improved capability to infer human intent from multimodal sensor data to provide the robotic hand perception abilities regarding the operational context. Such multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors including electromyography and inertial measurement units. A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control. In this paper, we present a dataset of this type that was gathered with the anticipation of cameras being built into prosthetic hands, and computer vision methods will need to assess this hand-view visual evidence in order to estimate human intent. Specifically, paired images from human eye-view and hand-view of various objects placed at different orientations have been captured at the initial state of grasping trials, followed by paired video, EMG and IMU from the arm of the human during a grasp, lift, put-down, and retract style trial structure. For each trial, based on eye-view images of the scene showing the hand and object on a table, multiple humans were asked to sort in decreasing order of preference, five grasp types appropriate for the object in its given configuration relative to the hand. The potential utility of paired eye-view and hand-view images was illustrated by training a convolutional neural network to process hand-view images in order to predict eye-view labels assigned by humans.

LGJan 13, 2021
Towards Creating a Deployable Grasp Type Probability Estimator for a Prosthetic Hand

Mehrshad Zandigohar, Mo Han, Deniz Erdogmus et al.

For lower arm amputees, prosthetic hands promise to restore most of physical interaction capabilities. This requires to accurately predict hand gestures capable of grabbing varying objects and execute them timely as intended by the user. Current approaches often rely on physiological signal inputs such as Electromyography (EMG) signal from residual limb muscles to infer the intended motion. However, limited signal quality, user diversity and high variability adversely affect the system robustness. Instead of solely relying on EMG signals, our work enables augmenting EMG intent inference with physical state probability through machine learning and computer vision method. To this end, we: (1) study state-of-the-art deep neural network architectures to select a performant source of knowledge transfer for the prosthetic hand, (2) use a dataset containing object images and probability distribution of grasp types as a new form of labeling where instead of using absolute values of zero and one as the conventional classification labels, our labels are a set of probabilities whose sum is 1. The proposed method generates probabilistic predictions which could be fused with EMG prediction of probabilities over grasps by using the visual information from the palm camera of a prosthetic hand. Our results demonstrate that InceptionV3 achieves highest accuracy with 0.95 angular similarity followed by 1.4 MobileNetV2 with 0.93 at ~20% the amount of operations.

SPSep 28, 2020
Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders

Mo Han, Ozan Ozdenizci, Toshiaki Koike-Akino et al.

Human computer interaction (HCI) involves a multidisciplinary fusion of technologies, through which the control of external devices could be achieved by monitoring physiological status of users. However, physiological biosignals often vary across users and recording sessions due to unstable physical/mental conditions and task-irrelevant activities. To deal with this challenge, we propose a method of adversarial feature encoding with the concept of a Rateless Autoencoder (RAE), in order to exploit disentangled, nuisance-robust, and universal representations. We achieve a good trade-off between user-specific and task-relevant features by making use of the stochastic disentanglement of the latent representations by adopting additional adversarial networks. The proposed model is applicable to a wider range of unknown users and tasks as well as different classifiers. Results on cross-subject transfer evaluations show the advantages of the proposed framework, with up to an 11.6% improvement in the average subject-transfer classification accuracy.

LGAug 26, 2020
Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in biosignal processing have enabled users to exploit their physiological status for manipulating devices in a reliable and safe manner. One major challenge of physiological sensing lies in the variability of biosignals across different users and tasks. To address this issue, we propose an adversarial feature extractor for transfer learning to exploit disentangled universal representations. We consider the trade-off between task-relevant features and user-discriminative information by introducing additional adversary and nuisance networks in order to manipulate the latent representations such that the learned feature extractor is applicable to unknown users and various tasks. Results on cross-subject transfer evaluations exhibit the benefits of the proposed framework, with up to 8.8% improvement in average accuracy of classification, and demonstrate adaptability to a broader range of subjects.

SPApr 15, 2020
Disentangled Adversarial Transfer Learning for Physiological Biosignals

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in wearable sensors demonstrate promising results for monitoring physiological status in effective and comfortable ways. One major challenge of physiological status assessment is the problem of transfer learning caused by the domain inconsistency of biosignals across users or different recording sessions from the same user. We propose an adversarial inference approach for transfer learning to extract disentangled nuisance-robust representations from physiological biosignal data in stress status level assessment. We exploit the trade-off between task-related features and person-discriminative information by using both an adversary network and a nuisance network to jointly manipulate and disentangle the learned latent representations by the encoder, which are then input to a discriminative classifier. Results on cross-subjects transfer evaluations demonstrate the benefits of the proposed adversarial framework, and thus show its capabilities to adapt to a broader range of subjects. Finally we highlight that our proposed adversarial transfer learning approach is also applicable to other deep feature learning frameworks.