Shuo Ma

CV
h-index16
4papers
52citations
Novelty49%
AI Score36

4 Papers

CVFeb 16, 2023
Robust Human Motion Forecasting using Transformer-based Model

Esteve Valls Mascaro, Shuo Ma, Hyemin Ahn et al.

Comprehending human motion is a fundamental challenge for developing Human-Robot Collaborative applications. Computer vision researchers have addressed this field by only focusing on reducing error in predictions, but not taking into account the requirements to facilitate its implementation in robots. In this paper, we propose a new model based on Transformer that simultaneously deals with the real time 3D human motion forecasting in the short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently exploit the spatio-temporal information of a shortly observed sequence (400ms) and generates a competitive accuracy against the current state-of-the-art. 2CH-TR stands out for the efficient performance of the Transformer, being lighter and faster than its competitors. In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment. Our experiment results show that the proposed 2CH-TR outperforms the ST-Transformer, which is another state-of-the-art model based on the Transformer, in terms of reconstruction and prediction under the same conditions of input prefix. Our model reduces in 8.89% the mean squared error of ST-Transformer in short-term prediction, and 2.57% in long-term prediction in Human3.6M dataset with 400ms input prefix. Webpage: https://evm7.github.io/2CHTR-page/

CVMay 25, 2023Code
Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning

Xin Qin, Jindong Wang, Shuo Ma et al.

Human activity recognition (HAR) is a time series classification task that focuses on identifying the motion patterns from human sensor readings. Adequate data is essential but a major bottleneck for training a generalizable HAR model, which assists customization and optimization of online web applications. However, it is costly in time and economy to collect large-scale labeled data in reality, i.e., the low-resource challenge. Meanwhile, data collected from different persons have distribution shifts due to different living habits, body shapes, age groups, etc. The low-resource and distribution shift challenges are detrimental to HAR when applying the trained model to new unseen subjects. In this paper, we propose a novel approach called Diverse and Discriminative representation Learning (DDLearn) for generalizable low-resource HAR. DDLearn simultaneously considers diversity and discrimination learning. With the constructed self-supervised learning task, DDLearn enlarges the data diversity and explores the latent activity properties. Then, we propose a diversity preservation module to preserve the diversity of learned features by enlarging the distribution divergence between the original and augmented domains. Meanwhile, DDLearn also enhances semantic discrimination by learning discriminative representations with supervised contrastive learning. Extensive experiments on three public HAR datasets demonstrate that our method significantly outperforms state-of-art methods by an average accuracy improvement of 9.5% under the low-resource distribution shift scenarios, while being a generic, explainable, and flexible framework. Code is available at: https://github.com/microsoft/robustlearn.

LGJul 12, 2025
XiChen: An observation-scalable fully AI-driven global weather forecasting system with 4D variational knowledge

Wuxin Wang, Weicheng Ni, Lilan Huang et al.

Recent advancements in Artificial Intelligence (AI) demonstrate significant potential to revolutionize weather forecasting. However, most AI-driven models rely on Numerical Weather Prediction (NWP) systems for initial condition preparation, which often consumes hours on supercomputers. Here we introduce XiChen, the first observation-scalable fully AI-driven global weather forecasting system, whose entire pipeline, from Data Assimilation (DA) to medium-range forecasting, can be accomplished within only 17 seconds. XiChen is built upon a foundation model that is pre-trained for weather forecasting. Meanwhile, this model is subsequently fine-tuned to serve as both observation operators and DA models, thereby scalably assimilating conventional and raw satellite observations. Furthermore, the integration of four-dimensional variational knowledge ensures that XiChen's DA and medium-range forecasting accuracy rivals that of operational NWP systems, amazingly achieving a skillful forecasting lead time exceeding 8.25 days. These findings demonstrate that XiChen holds strong potential toward fully AI-driven weather forecasting independent of NWP systems.

SDJul 21, 2020
Guided multi-branch learning systems for sound event detection with sound separation

Yuxin Huang, Liwei Lin, Shuo Ma et al.

In this paper, we describe in detail our systems for DCASE 2020 Task 4. The systems are based on the 1st-place system of DCASE 2019 Task 4, which adopts weakly-supervised framework with an attention-based embedding-level pooling module and a semi-supervised learning approach named guided learning. This year, we incorporate multi-branch learning (MBL) into the original system to further improve its performance. MBL uses different branches with different pooling strategies (including instance-level and embedding-level strategies) and different pooling modules (including attention pooling, global max pooling or global average pooling modules), which share the same feature encoder of the model. Therefore, multiple branches pursuing different purposes and focusing on different characteristics of the data can help the feature encoder model the feature space better and avoid over-fitting. To better exploit the strongly-labeled synthetic data, inspired by multi-task learning, we also employ a sound event detection branch. To combine sound separation (SS) with sound event detection (SED), we fuse the results of SED systems with SS-SED systems which are trained using separated sound output by an SS system. The experimental results prove that MBL can improve the model performance and using SS has great potential to improve the performance of SED ensemble system.