Semantic Labeling of Human Action For Visually Impaired And Blind People Scene Interaction
This work addresses a domain-specific problem for visually impaired and blind individuals, but it appears incremental as it builds on existing action recognition methods.
The paper tackles the problem of enabling visually impaired and blind people to understand and interact with surrounding human actions by developing a tactile device, using a fusion of skeleton and depth modalities with MS-G3D and CNN models to recognize actions, though no specific performance numbers are provided.
The aim of this work is to contribute to the development of a tactile device for visually impaired and blind persons in order to let them to understand actions of the surrounding people and to interact with them. First, based on the state-of-the-art methods of human action recognition from RGB-D sequences, we use the skeleton information provided by Kinect, with the disentangled and unified multi-scale Graph Convolutional (MS-G3D) model to recognize the performed actions. We tested this model on real scenes and found some of constraints and limitations. Next, we apply a fusion between skeleton modality with MS-G3D and depth modality with CNN in order to bypass the discussed limitations. Third, the recognized actions are labeled semantically and will be mapped into an output device perceivable by the touch sense.