LGJul 30, 2024Code
PIP: Prototypes-Injected Prompt for Federated Class Incremental LearningMuhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.
Federated Class Incremental Learning (FCIL) is a new direction in continual learning (CL) for addressing catastrophic forgetting and non-IID data distribution simultaneously. Existing FCIL methods call for high communication costs and exemplars from previous classes. We propose a novel rehearsal-free method for FCIL named prototypes-injected prompt (PIP) that involves 3 main ideas: a) prototype injection on prompt learning, b) prototype augmentation, and c) weighted Gaussian aggregation on the server side. Our experiment result shows that the proposed method outperforms the current state of the arts (SOTAs) with a significant improvement (up to 33%) in CIFAR100, MiniImageNet and TinyImageNet datasets. Our extensive analysis demonstrates the robustness of PIP in different task sizes, and the advantage of requiring smaller participating local clients, and smaller global rounds. For further study, source codes of PIP, baseline, and experimental logs are shared publicly in https://github.com/anwarmaxsum/PIP.
LGDec 12, 2023Code
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsGiang Do, Khiem Le, Quang Pham et al.
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces \HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of \HyperRouter compared to existing routing methods. Our implementation is publicly available at {\url{https://github.com/giangdip2410/HyperRouter}}.
LGJun 17, 2025Code
Multi-Scale Finetuning for Encoder-based Time Series Foundation ModelsZhongzheng Qiao, Chenghao Liu, Yiming Zhang et al.
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance. Given the diverse temporal patterns across sampling scales and the inherent multi-scale forecasting capabilities of TSFMs, we adopt a causal perspective to analyze finetuning process, through which we highlight the critical importance of explicitly modeling multiple scales and reveal the shortcomings of naive approaches. Focusing on encoder-based TSFMs, we propose Multiscale finetuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. Experimental results on three different backbones (Moirai, Moment and Units) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods. Codes are available at https://github.com/zqiao11/MSFT.
AIMay 19, 2025Code
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via CompetitionNam V. Nguyen, Huy Nguyen, Quang Pham et al.
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel mechanism to route tokens to experts with the highest neural response. Theoretically, we show that the competition mechanism enjoys a better sample efficiency than the traditional softmax routing. Furthermore, we develop CompeteSMoE, a simple yet effective algorithm to train large language models by deploying a router to learn the competition policy, thus enjoying strong performances at a low training overhead. Our extensive empirical evaluations on both the visual instruction tuning and language pre-training tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies. We have made the implementation available at: https://github.com/Fsoft-AIC/CompeteSMoE. This work is an improved version of the previous study at arXiv:2402.02526
LGJul 16, 2025Code
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online LearningM. Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.
The data privacy constraint in online continual learning (OCL), where the data can be seen only once, complicates the catastrophic forgetting problem in streaming data. A common approach applied by the current SOTAs in OCL is with the use of memory saving exemplars or features from previous classes to be replayed in the current task. On the other hand, the prompt-based approach performs excellently in continual learning but with the cost of a growing number of trainable parameters. The first approach may not be applicable in practice due to data openness policy, while the second approach has the issue of throughput associated with the streaming data. In this study, we propose a novel prompt-based method for online continual learning that includes 4 main components: (1) single light-weight prompt generator as a general knowledge, (2) trainable scaler-and-shifter as specific knowledge, (3) pre-trained model (PTM) generalization preserving, and (4) hard-soft updates mechanism. Our proposed method achieves significantly higher performance than the current SOTAs in CIFAR100, ImageNet-R, ImageNet-A, and CUB dataset. Our complexity analysis shows that our method requires a relatively smaller number of parameters and achieves moderate training time, inference time, and throughput. For further study, the source code of our method is available at https://github.com/anwarmaxsum/PROL.
LGJan 25, 2024Code
Dynamic Long-Term Time-Series Forecasting via Meta Transformer NetworksMuhammad Anwar Ma'sum, MD Rasel Sarkar, Mahardhika Pratama et al.
A reliable long-term time-series forecaster is highly demanded in practice but comes across many challenges such as low computational and memory footprints as well as robustness against dynamic learning environments. This paper proposes Meta-Transformer Networks (MANTRA) to deal with the dynamic long-term time-series forecasting tasks. MANTRA relies on the concept of fast and slow learners where a collection of fast learners learns different aspects of data distributions while adapting quickly to changes. A slow learner tailors suitable representations to fast learners. Fast adaptations to dynamic environments are achieved using the universal representation transformer layers producing task-adapted representations with a small number of parameters. Our experiments using four datasets with different prediction lengths demonstrate the advantage of our approach with at least $3\%$ improvements over the baseline algorithms for both multivariate and univariate settings. Source codes of MANTRA are publicly available in \url{https://github.com/anwarmaxsum/MANTRA}.
LGJan 24, 2022Code
PaRT: Parallel Learning Towards Robust and Transparent AIMahsa Paknezhad, Hamsawardhini Rengarajan, Chenghao Yuan et al.
This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network resources. Therefore, the trained network can share similar representations across various tasks, while also enabling independent task-related representations. The above allows for some crucial outcomes. (1) The parallel nature of our approach negates the issue of catastrophic forgetting. (2) The sharing of segments uses network resources more efficiently. (3) We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations. (4) Through examination of individual task-related and shared representations, the model offers transparency in the network and in the relationships across tasks in a multi-task setting. Evaluation of the proposed approach against complex competing approaches such as Continual Learning, Neural Architecture Search, and Multi-task learning shows that it is capable of learning robust representations. This is the first effort to train a DL model on multiple tasks in parallel. Our code is available at https://github.com/MahsaPaknezhad/PaRT
LGFeb 4, 2024
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via CompetitionQuang Pham, Giang Do, Huy Nguyen et al.
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.
LGMay 21, 2024
Prompt-Based Spatio-Temporal Graph Transfer LearningJunfeng Hu, Xu Liu, Zhencheng Fan et al.
Spatio-temporal graph neural networks have proven efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. Yet, their performance is constrained by the reliance on extensive data for training on a specific task, thereby limiting their adaptability to new urban domains with varied task demands. Although transfer learning has been proposed to remedy this problem by leveraging knowledge across domains, the cross-task generalization still remains under-explored in spatio-temporal graph transfer learning due to the lack of a unified framework. To bridge the gap, we propose Spatio-Temporal Graph Prompting (STGP), a prompt-based framework capable of adapting to multi-diverse tasks in a data-scarce domain. Specifically, we first unify different tasks into a single template and introduce a task-agnostic network architecture that aligns with this template. This approach enables capturing dependencies shared across tasks. Furthermore, we employ learnable prompts to achieve domain and task transfer in a two-stage prompting pipeline, facilitating the prompts to effectively capture domain knowledge and task-specific properties. Our extensive experiments demonstrate that STGP outperforms state-of-the-art baselines in three tasks-forecasting, kriging, and extrapolation-achieving an improvement of up to 10.7%.
ROMay 2, 2024
Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone RacingZhongzheng Qiao, Xuan Huy Pham, Savitha Ramasamy et al.
In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness.
LGFeb 10, 2025
Sequence Transferability and Task Order Selection in Continual LearningThinh Nguyen, Cuong N. Nguyen, Quang Pham et al.
In continual learning, understanding the properties of task sequences and their relationships to model performance is important for developing advanced algorithms with better accuracy. However, efforts in this direction remain underdeveloped despite encouraging progress in methodology development. In this work, we investigate the impacts of sequence transferability on continual learning and propose two novel measures that capture the total transferability of a task sequence, either in the forward or backward direction. Based on the empirical properties of these measures, we then develop a new method for the task order selection problem in continual learning. Our method can be shown to offer a better performance than the conventional strategy of random task selection.
LGFeb 8, 2022
Contrastive predictive coding for Anomaly Detection in Multi-variate Time Series DataTheivendiram Pranavan, Terence Sim, Arulmurugan Ambikapathi et al.
Anomaly detection in multi-variate time series (MVTS) data is a huge challenge as it requires simultaneous representation of long term temporal dependencies and correlations across multiple variables. More often, this is solved by breaking the complexity through modeling one dependency at a time. In this paper, we propose a Time-series Representational Learning through Contrastive Predictive Coding (TRL-CPC) towards anomaly detection in MVTS data. First, we jointly optimize an encoder, an auto-regressor and a non-linear transformation function to effectively learn the representations of the MVTS data sets, for predicting future trends. It must be noted that the context vectors are representative of the observation window in the MTVS. Next, the latent representations for the succeeding instants obtained through non-linear transformations of these context vectors, are contrasted with the latent representations of the encoder for the multi-variables such that the density for the positive pair is maximized. Thus, the TRL-CPC helps to model the temporal dependencies and the correlations of the parameters for a healthy signal pattern. Finally, fitting the latent representations are fit into a Gaussian scoring function to detect anomalies. Evaluation of the proposed TRL-CPC on three MVTS data sets against SOTA anomaly detection methods shows the superiority of TRL-CPC.
CYJan 7, 2022
Incremental Knowledge Tracing from Multiple SchoolsSujanya Suresh, Savitha Ramasamy, P. N. Suganthan et al.
Knowledge tracing is the task of predicting a learner's future performance based on the history of the learner's performance. Current knowledge tracing models are built based on an extensive set of data that are collected from multiple schools. However, it is impossible to pool learner's data from all schools, due to data privacy and PDPA policies. Hence, this paper explores the feasibility of building knowledge tracing models while preserving the privacy of learners' data within their respective schools. This study is conducted using part of the ASSISTment 2009 dataset, with data from multiple schools being treated as separate tasks in a continual learning framework. The results show that learning sequentially with the Self Attentive Knowledge Tracing (SAKT) algorithm is able to achieve considerably similar performance to that of pooling all the data together.
LGSep 23, 2021
An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time SeriesAstha Garg, Wenyu Zhang, Jules Samaran et al.
Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e. the scoring functions independently of each other, through a grid of 10 models and 4 scoring functions, comparing these variants to state of the art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time-points. Through experiments, we find that the existing evaluation metrics either do not take events into account, or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score ($Fc_1$), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model - the Univariate Fully-Connected Auto-Encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state of the art algorithms.
LGDec 12, 2020
Knowledge Capture and Replay for Continual LearningSaisubramaniam Gopalakrishnan, Pranshu Ranjan Singh, Haytham Fayek et al.
Deep neural networks have shown promise in several domains, and the learned data (task) specific information is implicitly stored in the network parameters. Extraction and utilization of encoded knowledge representations are vital when data is no longer available in the future, especially in a continual learning scenario. In this work, we introduce {\em flashcards}, which are visual representations that {\em capture} the encoded knowledge of a network as a recursive function of predefined random image patterns. In a continual learning scenario, flashcards help to prevent catastrophic forgetting and consolidating knowledge of all the previous tasks. Flashcards need to be constructed only before learning the subsequent task, and hence, independent of the number of tasks trained before. We demonstrate the efficacy of flashcards in capturing learned knowledge representation (as an alternative to the original dataset) and empirically validate on a variety of continual learning tasks: reconstruction, denoising, task-incremental learning, and new-instance learning classification, using several heterogeneous benchmark datasets. Experimental evidence indicates that: (i) flashcards as a replay strategy is { \em task agnostic}, (ii) performs better than generative replay, and (iii) is on par with episodic replay without additional memory overhead.
LGNov 18, 2019
Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time SeriesYang Guo, Zhengyuan Liu, Pavitra Krishnswamy et al.
Real-world clinical time series data sets exhibit a high prevalence of missing values. Hence, there is an increasing interest in missing data imputation. Traditional statistical approaches impose constraints on the data-generating process and decouple imputation from prediction. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time series data. However, they generate deterministic outputs and neglect the inherent uncertainty. In this work, we introduce a unified Bayesian recurrent framework for simultaneous imputation and prediction on time series data sets. We evaluate our approach on two real-world mortality prediction tasks using the MIMIC-III and PhysioNet benchmark datasets. We demonstrate strong performance gains over state-of-the-art (SOTA) methods, and provide strategies to use the resulting probability distributions to better assess reliability of the imputations and predictions.
NEMar 21, 2019
Efficient single input-output layer spiking neural classifier with time-varying weight modelAbeegithan Jeyasothy, Savitha Ramasamy, Suresh Sundaram
This paper presents a supervised learning algorithm, namely, the Synaptic Efficacy Function with Meta-neuron based learning algorithm (SEF-M) for a spiking neural network with a time-varying weight model. For a given pattern, SEF-M uses the learning algorithm derived from meta-neuron based learning algorithm to determine the change in weights corresponding to each presynaptic spike times. The changes in weights modulate the amplitude of a Gaussian function centred at the same presynaptic spike times. The sum of amplitude modulated Gaussian functions represents the synaptic efficacy functions (or time-varying weight models). The performance of SEF-M is evaluated against state-of-the-art spiking neural network learning algorithms on 10 benchmark datasets from UCI machine learning repository. Performance studies show superior generalization ability of SEF-M. An ablation study on time-varying weight model is conducted using JAFFE dataset. The results of the ablation study indicate that using a time-varying weight model instead of single weight model improves the classification accuracy by 14%. Thus, it can be inferred that a single input-output layer spiking neural network with time-varying weight model is computationally more efficient than a multi-layer spiking neural network with long-term or short-term weight model.
CLMar 8, 2019
Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom MonitoringZhengyuan Liu, Hazel Lim, Nur Farah Ain Binte Suhaimi et al.
Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations.
NEFeb 28, 2019
A novel method for extracting interpretable knowledge from a spiking neural classifier with time-varying synaptic weightsAbeegithan Jeyasothy, Suresh Sundaram, Savitha Ramasamy et al.
This paper presents a novel method for information interpretability in an MC-SEFRON classifier. To develop a method to extract knowledge stored in a trained classifier, first, the binary-class SEFRON classifier developed earlier is extended to handle multi-class problems. MC-SEFRON uses the population encoding scheme to encode the real-valued input data into spike patterns. MC-SEFRON is trained using the same supervised learning rule used in the SEFRON. After training, the proposed method extracts the knowledge for a given class stored in the classifier by mapping the weighted postsynaptic potential in the time domain to the feature domain as Feature Strength Functions (FSFs). A set of FSFs corresponding to each output class represents the extracted knowledge from the classifier. This knowledge encoding method is derived to maintain consistency between the classification in the time domain and the feature domain. The correctness of the FSF is quantitatively measured by using FSF directly for classification tasks. For a given input, each FSF is sampled at the input value to obtain the corresponding feature strength value (FSV). Then the aggregated FSVs obtained for each class are used to determine the output class labels during classification. FSVs are also used to interpret the predictions during the classification task. Using ten UCI datasets and the MNIST dataset, the knowledge extraction method, interpretation and the reliability of the FSF are demonstrated. Based on the studies, it can be seen that on an average, the difference in the classification accuracies using the FSF directly and those obtained by MC-SEFRON is only around 0.9% & 0.1\% for the UCI datasets and the MNIST dataset respectively. This clearly shows that the knowledge represented by the FSFs has acceptable reliability and the interpretability of classification using the classifier's knowledge has been justified.
COMP-PHNov 15, 2018
Predicting thermoelectric properties from crystal graphs and material descriptors - first application for functional materialsLeo Laugier, Daniil Bash, Jose Recatala et al.
We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results show that the optimized FCNN is three layer deep and is able to predict the scattering-time independent thermoelectric powerfactor much better than the CGCNN (or XGBoost), suggesting that bonding and density of states descriptors informed from materials science knowledge obtained partially from DFT are vital to predict functional properties.
LGSep 24, 2018
Autonomous Deep Learning: Incremental Learning of Denoising Autoencoder for Evolving Data StreamsMahardhika Pratama, Andri Ashfahani, Yew Soon Ong et al.
The generative learning phase of Autoencoder (AE) and its successor Denosing Autoencoder (DAE) enhances the flexibility of data stream method in exploiting unlabelled samples. Nonetheless, the feasibility of DAE for data stream analytic deserves in-depth study because it characterizes a fixed network capacity which cannot adapt to rapidly changing environments. An automated construction of a denoising autoeconder, namely deep evolving denoising autoencoder (DEVDAN), is proposed in this paper. DEVDAN features an open structure both in the generative phase and in the discriminative phase where input features can be automatically added and discarded on the fly. A network significance (NS) method is formulated in this paper and is derived from the bias-variance concept. This method is capable of estimating the statistical contribution of the network structure and its hidden units which precursors an ideal state to add or prune input features. Furthermore, DEVDAN is free of the problem- specific threshold and works fully in the single-pass learning fashion. The efficacy of DEVDAN is numerically validated using nine non-stationary data stream problems simulated under the prequential test-then-train protocol where DEVDAN is capable of delivering an improvement of classification accuracy to recently published online learning works while having flexibility in the automatic extraction of robust input features and in adapting to rapidly changing environments.
NEMar 6, 2018
Online Deep Learning: Growing RBM on the flySavitha Ramasamy, Kanagasabai Rajaraman, Pavitra Krishnaswamy et al.
We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representation at the hidden layer and (2) a discriminative phase for classification. The online generative training begins with zero neurons in the hidden layer, adds and updates the neurons to adapt to statistics of streaming data in a single pass unsupervised manner, resulting in a feature representation best suited to the data. The discriminative phase is based on stochastic gradient descent and associates the represented features to the class labels. We demonstrate the OGD-RBM on a set of multi-category and binary classification problems for data sets having varying degrees of class-imbalance. We first apply the OGD-RBM algorithm on the multi-class MNIST dataset to characterize the network evolution. We demonstrate that the online generative phase converges to a stable, concise network architecture, wherein individual neurons are inherently discriminative to the class labels despite unsupervised training. We then benchmark OGD-RBM performance to other machine learning, neural network and ClassRBM techniques for credit scoring applications using 3 public non-stationary two-class credit datasets with varying degrees of class-imbalance. We report that OGD-RBM improves accuracy by 2.5-3% over batch learning techniques while requiring at least 24%-70% fewer neurons and fewer training samples. This online generative training approach can be extended greedily to multiple layers for training Deep Belief Networks in non-stationary data mining applications without the need for a priori fixed architectures.