MyoSem: Aligning Electromyography to Natural-Language Action Semantics for Hand Action Understanding
For researchers in gesture recognition and prosthetic control, MyoSem shifts EMG-based hand action understanding from fixed-label classification to queryable semantic retrieval, enabling more flexible and generalizable interaction.
MyoSem introduces an EMG-to-text semantic alignment framework that enables bidirectional retrieval between EMG signals and natural language action descriptions, outperforming most baselines on EMG2Pose and NinaPro datasets and showing generalization to unseen users and amputee scenarios.
Electromyography (EMG) directly reflects muscle activation and is a key sensing modality for gesture recognition, prosthetic control, and wearable interaction. Existing EMG methods, however, commonly formulate hand action understanding as classification over fixed labels, making it difficult to support querying, retrieval, and generalization based on action descriptions. We present MyoSem, an EMG--action semantic alignment framework that maps low-level EMG signals into a shared semantic space constructed from multi-view action descriptions. MyoSem combines multi-view action-semantic construction, activation-aware EMG encoding, and semantic query alignment, enabling bidirectional retrieval between EMG signals and text descriptions. We systematically evaluate MyoSem on EMG2Pose and NinaPro-series datasets. Results show that MyoSem performs well on EMG--text bidirectional retrieval, generally outperforms most baselines, and shows favorable generalization to unseen users, held-out action classes, and amputee-user transfer scenarios. Ablations and visualizations further validate the effectiveness of each module. Overall, MyoSem advances EMG-based hand action understanding from fixed-label recognition toward queryable bidirectional semantic retrieval, providing a new modeling paradigm for language-mediated EMG action understanding.