Dong Ma

h-index45

5papers

65citations

Novelty50%

AI Score38

Ranked #87,926 of 194,257 authors (top 45%)#19,500 in LG (top 49%)

5 Papers

1.8LGApr 26, 2022

Improving Feature Generalizability with Multitask Learning in Class Incremental Learning

Dong Ma, Chi Ian Tang, Cecilia Mascolo

Many deep learning applications, like keyword spotting, require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesize that a more transferable and generalizable feature representation from the base model would be beneficial to incremental learning. In this work, we adopt multitask learning during base model training to improve the feature generalizability. Specifically, instead of training a single model with all the base classes, we decompose the base classes into multiple subsets and regard each of them as a task. These tasks are trained concurrently and a shared feature extractor is obtained for incremental learning. We evaluate our approach on two datasets under various configurations. The results show that our approach enhances the average incremental learning accuracy by up to 5.5%, which enables more reliable and accurate keyword spotting over time. Moreover, the proposed approach can be combined with many existing techniques and provides additional performance gain.

4.0SDNov 21, 2025

Device-Guided Music Transfer

Manh Pham Hung, Changshuo Hu, Ting Dang et al.

Device-guided music transfer adapts playback across unseen devices for users who lack them. Existing methods mainly focus on modifying the timbre, rhythm, harmony, or instrumentation to mimic genres or artists, overlooking the diverse hardware properties of the playback device (i.e., speaker). Therefore, we propose DeMT, which processes a speaker's frequency response curve as a line graph using a vision-language model to extract device embeddings. These embeddings then condition a hybrid transformer via feature-wise linear modulation. Fine-tuned on a self-collected dataset, DeMT enables effective speaker-style transfer and robust few-shot adaptation for unseen devices, supporting applications like device-style augmentation and quality enhancement.

2.6LGMar 14, 2024Code

DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Xiao Ma, Shengfeng He, Hezhe Qiao et al.

Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture, where the selector routes each input sample to the appropriate classifier for classification. DiTMoS is grounded on a key insight: a composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound. To approach the upper bound, DiTMoS introduces three strategies including diverse training data splitting to increase the classifiers' diversity, adversarial selector-classifiers training to ensure synergistic interactions thereby maximizing their complementarity, and heterogeneous feature aggregation to improve the capacity of classifiers. We further propose a network slicing technique to alleviate the extra memory overhead incurred by feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition, respectively. The experiment results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement compared to the best baseline; (b) network slicing almost completely eliminates the memory overhead incurred by feature aggregation with a marginal increase of latency.

12.0HCAug 20, 2021

hEARt: Motion-resilient Heart Rate Monitoring with In-ear Microphones

Kayla-Jade Butkow, Ting Dang, Andrea Ferlini et al.

With the soaring adoption of in-ear wearables, the research community has started investigating suitable in-ear heart rate (HR) detection systems. HR is a key physiological marker of cardiovascular health and physical fitness. Continuous and reliable HR monitoring with wearable devices has therefore gained increasing attention in recent years. Existing HR detection systems in wearables mainly rely on photoplethysmography (PPG) sensors, however, these are notorious for poor performance in the presence of human motion. In this work, leveraging the occlusion effect that enhances low-frequency bone-conducted sounds in the ear canal, we investigate for the first time \textit{in-ear audio-based motion-resilient} HR monitoring. We first collected HR-induced sounds in the ear canal leveraging an in-ear microphone under stationary and three different activities (i.e., walking, running, and speaking). Then, we devised a novel deep learning based motion artefact (MA) mitigation framework to denoise the in-ear audio signals, followed by an HR estimation algorithm to extract HR. With data collected from 20 subjects over four activities, we demonstrate that hEARt, our end-to-end approach, achieves a mean absolute error (MAE) of 3.02 $\pm$ 2.97~BPM, 8.12 $\pm$ 6.74~BPM, 11.23 $\pm$ 9.20~BPM and 9.39 $\pm$ 6.97~BPM for stationary, walking, running and speaking, respectively, opening the door to a new non-invasive and affordable HR monitoring with usable performance for daily activities. Not only does hEARt outperform previous in-ear HR monitoring work, but it outperforms reported in-ear PPG performance.

3.8CRJun 14, 2021

A Novel Variable K-Pseudonym Scheme Applied to 5G Anonymous Access Authentication

Dong Ma, Xixiang Lyu, Renpeng Zou

Anonymous access authentication schemes provide users with massive application services while protecting the privacy of users' identities. The identity protection schemes in 3G and 4G are not suitable for 5G anonymous access authentication due to complex computation and pseudonym asynchrony. In this paper, we consider mobile devices with limited resources in the 5G network and propose an anonymous access authentication scheme without the Public Key Infrastructure. The anonymous access authentication scheme provides users with variable shard pseudonyms to protect users' identities asynchronously. With the variable shared pseudonym, our scheme can ensure user anonymity and resist the mark attack, a novel attack aimed at the basic k-pseudonym scheme. Finally, we analyze the scheme with BAN logic analysis and verify the user anonymity.