Junn Yong Loo

LG
h-index42
19papers
40citations
Novelty52%
AI Score57

19 Papers

LGMay 28Code
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

Hwa Hui Tew, Junn Yong Loo, Fang Yu Leong et al.

Functional Magnetic Resonance Imaging (fMRI) provides non-invasive access to dynamic brain activity by measuring blood oxygen level-dependent (BOLD) signals over time. However, the resource-intensive nature of fMRI acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often remain challenging in replicating their inherent non-stationarity, intricate spatiotemporal dynamics, and physiological variations of raw BOLD signals. To address these challenges, we propose Dual-Spectral Flow Matching (DSFM), a novel fMRI generative framework that cascades dual frequency representation of BOLD signals with spectral flow matching. Specifically, our framework first converts BOLD signals into a wavelet decomposition map via a discrete wavelet transform (DWT) to capture globalized transient and multi-scale variations, and projects into the discrete cosine transform (DCT) space across brain regions and time to exploit localized energy compaction of low-frequency dominant BOLD coefficients. Subsequently, a spectral flow matching model is trained to generate class-conditioned cosine-frequency representation. The generated samples are reconstructed through inverse DCT and inverse DWT operations to recover physiologically plausible time-domain BOLD signals. This dual-transform approach imposes structured frequency priors and preserves key physiological brain dynamics. Ultimately, we demonstrate the efficacy of our approach through improved downstream fMRI-based brain network classification. The code is available at https://github.com/htew0001/DSFM.git .

LGFeb 14, 2023Code
A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification

Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting et al.

Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.

ROMar 3, 2023Code
Cross-domain Transfer Learning and State Inference for Soft Robots via a Semi-supervised Sequential Variational Bayes Framework

Shageenderan Sapai, Junn Yong Loo, Ze Yang Ding et al.

Recently, data-driven models such as deep neural networks have shown to be promising tools for modelling and state inference in soft robots. However, voluminous amounts of data are necessary for deep models to perform effectively, which requires exhaustive and quality data collection, particularly of state labels. Consequently, obtaining labelled state data for soft robotic systems is challenged for various reasons, including difficulty in the sensorization of soft robots and the inconvenience of collecting data in unstructured environments. To address this challenge, in this paper, we propose a semi-supervised sequential variational Bayes (DSVB) framework for transfer learning and state inference in soft robots with missing state labels on certain robot configurations. Considering that soft robots may exhibit distinct dynamics under different robot configurations, a feature space transfer strategy is also incorporated to promote the adaptation of latent features across multiple configurations. Unlike existing transfer learning approaches, our proposed DSVB employs a recurrent neural network to model the nonlinear dynamics and temporal coherence in soft robot data. The proposed framework is validated on multiple setup configurations of a pneumatic-based soft robot finger. Experimental results on four transfer scenarios demonstrate that DSVB performs effective transfer learning and accurate state inference amidst missing state labels. The data and code are available at https://github.com/shageenderan/DSVB.

LGJun 8, 2023
A Deep Probabilistic Flow-Based Framework for Unsupervised Cross-Domain Soft Sensing

Junn Yong Loo, Hwa Hui Tew, Fang Yu Leong et al.

Industrial soft sensing is crucial for accurate process monitoring through reliable inference of dominant sensor variables. However, developing effective data-driven soft sensor models presents challenges, such as achieving domain adaptability, addressing incomplete sensor labels, and learning stochastic data variability. To overcome these challenges, we propose a Deep Variational Potential Flow (DVPF) framework for cross-domain soft sensor modeling, taking into account the lack of sensor labels in the target domain. Our framework introduces sequential variational Bayes with recurrent neural network (RNN) parameterization to address the maximum likelihood estimation problem that characterizes cross-domain soft sensing. Central to the framework is a potential flow that performs unsupervised Bayesian inference on the RNN-extracted features to obtain an exact representation of the intractable posterior distribution. Together, these DVPF components learn domain-adaptable features that effectively capture complex cross-domain process dynamics and data variability. We validate the proposed DVPF on a real industrial multiphase flow process across varying operating modes. The results show that the DVPF demonstrates superior performance in cross-domain soft sensing compared to existing deep feature-based domain adaptation methods.

LGJul 21, 2024
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling

Junn Yong Loo, Michelle Adeline, Arghya Pal et al.

Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MCMC sampling and does not rely on complementary latent models or cooperative training. The VAPO framework aims to learn a potential energy function whose gradient (flow) guides the prior samples, so that their density evolution closely follows an approximate data likelihood homotopy. An energy loss function is then formulated to minimize the Kullback-Leibler divergence between density evolution of the flow-driven prior and the data likelihood homotopy. Images can be generated after training the potential energy, by initializing the samples from Gaussian prior and solving the ODE governing the potential flow on a fixed time interval using generic ODE solvers. Experiment results show that the proposed VAPO framework is capable of generating realistic images on various image datasets. In particular, our proposed framework achieves competitive FID scores for unconditional image generation on the CIFAR-10 and CelebA datasets.

LGJan 2, 2025Code
ST-HCSS: Deep Spatio-Temporal Hypergraph Convolutional Neural Network for Soft Sensing

Hwa Hui Tew, Fan Ding, Gaoxuan Li et al.

Higher-order sensor networks are more accurate in characterizing the nonlinear dynamics of sensory time-series data in modern industrial settings by allowing multi-node connections beyond simple pairwise graph edges. In light of this, we propose a deep spatio-temporal hypergraph convolutional neural network for soft sensing (ST-HCSS). In particular, our proposed framework is able to construct and leverage a higher-order graph (hypergraph) to model the complex multi-interactions between sensor nodes in the absence of prior structural knowledge. To capture rich spatio-temporal relationships underlying sensor data, our proposed ST-HCSS incorporates stacked gated temporal and hypergraph convolution layers to effectively aggregate and update hypergraph information across time and nodes. Our results validate the superiority of ST-HCSS compared to existing state-of-the-art soft sensors, and demonstrates that the learned hypergraph feature representations aligns well with the sensor data correlations. The code is available at https://github.com/htew0001/ST-HCSS.git

SYJun 21, 2023
Sigma-point Kalman Filter with Nonlinear Unknown Input Estimation via Optimization and Data-driven Approach for Dynamic Systems

Junn Yong Loo, Ze Yang Ding, Vishnu Monn Baskaran et al.

Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a general nonlinear UI estimator that can be implemented via nonlinear optimization and data-driven approaches. The nonlinear UI estimator uses the posterior state estimate which is less susceptible to state prediction error. In addition, we introduce a joint sigma-point transformation scheme to incorporate both the state and UI uncertainties in the estimation of SPKF-nUI. An in-depth stochastic stability analysis proves that the proposed SPKF-nUI yields exponentially converging estimation error bounds under reasonable assumptions. Finally, two case studies are carried out on a simulation-based rigid robot and a physical soft robot, i.e., robots made of soft materials with complex dynamics to validate effectiveness of the proposed filter on nonlinear dynamic systems. Our results demonstrate that the proposed SPKF-nUI achieves the lowest state and UI estimation errors when compared to the existing nonlinear state-UI filters.

CVJul 15, 2025Code
SpaRTAN: Spatial Reinforcement Token-based Aggregation Network for Visual Recognition

Quan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo et al.

The resurgence of convolutional neural networks (CNNs) in visual recognition tasks, exemplified by ConvNeXt, has demonstrated their capability to rival transformer-based architectures through advanced training methodologies and ViT-inspired design principles. However, both CNNs and transformers exhibit a simplicity bias, favoring straightforward features over complex structural representations. Furthermore, modern CNNs often integrate MLP-like blocks akin to those in transformers, but these blocks suffer from significant information redundancies, necessitating high expansion ratios to sustain competitive performance. To address these limitations, we propose SpaRTAN, a lightweight architectural design that enhances spatial and channel-wise information processing. SpaRTAN employs kernels with varying receptive fields, controlled by kernel size and dilation factor, to capture discriminative multi-order spatial features effectively. A wave-based channel aggregation module further modulates and reinforces pixel interactions, mitigating channel-wise redundancies. Combining the two modules, the proposed network can efficiently gather and dynamically contextualize discriminative features. Experimental results in ImageNet and COCO demonstrate that SpaRTAN achieves remarkable parameter efficiency while maintaining competitive performance. In particular, on the ImageNet-1k benchmark, SpaRTAN achieves 77. 7% accuracy with only 3.8M parameters and approximately 1.0 GFLOPs, demonstrating its ability to deliver strong performance through an efficient design. On the COCO benchmark, it achieves 50.0% AP, surpassing the previous benchmark by 1.2% with only 21.5M parameters. The code is publicly available at [https://github.com/henry-pay/SpaRTAN].

CVJul 15, 2025Code
Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection

Quan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo et al.

Human-object interaction (HOI) detection is essential for accurately localizing and characterizing interactions between humans and objects, providing a comprehensive understanding of complex visual scenes across various domains. However, existing HOI detectors often struggle to deliver reliable predictions efficiently, relying on resource-intensive training methods and inefficient architectures. To address these challenges, we conceptualize a wavelet attention-like backbone and a novel ray-based encoder architecture tailored for HOI detection. Our wavelet backbone addresses the limitations of expressing middle-order interactions by aggregating discriminative features from the low- and high-order interactions extracted from diverse convolutional filters. Concurrently, the ray-based encoder facilitates multi-scale attention by optimizing the focus of the decoder on relevant regions of interest and mitigating computational overhead. As a result of harnessing the attenuated intensity of learnable ray origins, our decoder aligns query embeddings with emphasized regions of interest for accurate predictions. Experimental results on benchmark datasets, including ImageNet and HICO-DET, showcase the potential of our proposed architecture. The code is publicly available at [https://github.com/henry-pay/RayEncoder].

ROMar 11
KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization

Fan Ding, Xuewen Luo, Fengze Yang et al.

Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.

ROJun 25, 2024Code
MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection

Michelle Adeline, Junn Yong Loo, Vishnu Monn Baskaran

Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-$k$ anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4\% mAP and 55.0\% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline where anchor proposals are modelled as learnable embeddings. Code is available at https://github.com/NaomiEX/MDHA.

RONov 9, 2024
Energy-efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles

Fan Ding, Xuewen Luo, Gaoxuan Li et al.

To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware. It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74\% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.

AINov 9, 2024
Cross-Domain Transfer Learning using Attention Latent Features for Multi-Agent Trajectory Prediction

Jia Quan Loh, Xuewen Luo, Fan Ding et al.

With the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effectively to unseen networks. To address this, we proposed a novel spatial-temporal trajectory prediction framework that performs cross-domain adaption on the attention representation of a Transformer-based model. A graph convolutional network is also integrated to construct dynamic graph feature embeddings that accurately model the complex spatial-temporal interactions between the multi-agent vehicles across multiple traffic domains. The proposed framework is validated on two case studies involving the cross-city and cross-period settings. Experimental results show that our proposed framework achieves superior trajectory prediction and domain adaptation performances over the state-of-the-art models.

LGJan 2, 2025
KANS: Knowledge Discovery Graph Attention Network for Soft Sensing in Multivariate Industrial Processes

Hwa Hui Tew, Gaoxuan Li, Fan Ding et al.

Soft sensing of hard-to-measure variables is often crucial in industrial processes. Current practices rely heavily on conventional modeling techniques that show success in improving accuracy. However, they overlook the non-linear nature, dynamics characteristics, and non-Euclidean dependencies between complex process variables. To tackle these challenges, we present a framework known as a Knowledge discovery graph Attention Network for effective Soft sensing (KANS). Unlike the existing deep learning soft sensor models, KANS can discover the intrinsic correlations and irregular relationships between the multivariate industrial processes without a predefined topology. First, an unsupervised graph structure learning method is introduced, incorporating the cosine similarity between different sensor embedding to capture the correlations between sensors. Next, we present a graph attention-based representation learning that can compute the multivariate data parallelly to enhance the model in learning complex sensor nodes and edges. To fully explore KANS, knowledge discovery analysis has also been conducted to demonstrate the interpretability of the model. Experimental results demonstrate that KANS significantly outperforms all the baselines and state-of-the-art methods in soft sensing performance. Furthermore, the analysis shows that KANS can find sensors closely related to different process variables without domain knowledge, significantly improving soft sensing accuracy.

LGApr 22, 2025
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching

Junn Yong Loo, Michelle Adeline, Julia Kaiwen Lau et al.

Energy-based models (EBMs) are a powerful class of probabilistic generative models due to their flexibility and interpretability. However, relationships between potential flows and explicit EBMs remain underexplored, while contrastive divergence training via implicit Markov chain Monte Carlo (MCMC) sampling is often unstable and expensive in high-dimensional settings. In this paper, we propose Variational Potential Flow Bayes (VPFB), a new energy-based generative framework that eliminates the need for implicit MCMC sampling and does not rely on auxiliary networks or cooperative training. VPFB learns an energy-parameterized potential flow by constructing a flow-driven density homotopy that is matched to the data distribution through a variational loss minimizing the Kullback-Leibler divergence between the flow-driven and marginal homotopies. This principled formulation enables robust and efficient generative modeling while preserving the interpretability of EBMs. Experimental results on image generation, interpolation, out-of-distribution detection, and compositional generation confirm the effectiveness of VPFB, showing that our method performs competitively with existing approaches in terms of sample quality and versatility across diverse generative modeling tasks.

NCNov 23, 2025
Brain-MGF: Multimodal Graph Fusion Network for EEG-fMRI Brain Connectivity Analysis Under Psilocybin

Sin-Yee Yap, Fuad Noman, Junn Yong Loo et al.

Psychedelics, such as psilocybin, reorganise large-scale brain connectivity, yet how these changes are reflected across electrophysiological (electroencephalogram, EEG) and haemodynamic (functional magnetic resonance imaging, fMRI) networks remains unclear. We present Brain-MGF, a multimodal graph fusion network for joint EEG-fMRI connectivity analysis. For each modality, we construct graphs with partial-correlation edges and Pearson-profile node features, and learn subject-level embeddings via graph convolution. An adaptive softmax gate then fuses modalities with sample-specific weights to capture context-dependent contributions. Using the world's largest single-site psilocybin dataset, PsiConnect, Brain-MGF distinguishes psilocybin from no-psilocybin conditions in meditation and rest. Fusion improves over unimodal and non-adaptive variants, achieving 74.0% accuracy and 76.5% F1 score on meditation, and 76.0% accuracy with 85.8% ROC-AUC on rest. UMAP visualisations reveal clearer class separation for fused embeddings. These results indicate that adaptive graph fusion effectively integrates complementary EEG-fMRI information, providing an interpretable framework for characterising psilocybin-induced alterations in large-scale neural organisation.

LGSep 25, 2025
T2I-Diff: fMRI Signal Generation via Time-Frequency Image Transform and Classifier-Free Denoising Diffusion Models

Hwa Hui Tew, Junn Yong Loo, Yee-Fan Tan et al.

Functional Magnetic Resonance Imaging (fMRI) is an advanced neuroimaging method that enables in-depth analysis of brain activity by measuring dynamic changes in the blood oxygenation level-dependent (BOLD) signals. However, the resource-intensive nature of fMRI data acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often underperform because they overlook the complex non-stationarity and nonlinear BOLD dynamics. To address these challenges, we introduce T2I-Diff, an fMRI generation framework that leverages time-frequency representation of BOLD signals and classifier-free denoising diffusion. Specifically, our framework first converts BOLD signals into windowed spectrograms via a time-dependent Fourier transform, capturing both the underlying temporal dynamics and spectral evolution. Subsequently, a classifier-free diffusion model is trained to generate class-conditioned frequency spectrograms, which are then reverted to BOLD signals via inverse Fourier transforms. Finally, we validate the efficacy of our approach by demonstrating improved accuracy and generalization in downstream fMRI-based brain network classification.

ROAug 23, 2025
Drive As You Like: Strategy-Level Motion Planning Based on A Multi-Head Diffusion Model

Fan Ding, Xuewen Luo, Hwa Hui Tew et al.

Recent advances in motion planning for autonomous driving have led to models capable of generating high-quality trajectories. However, most existing planners tend to fix their policy after supervised training, leading to consistent but rigid driving behaviors. This limits their ability to reflect human preferences or adapt to dynamic, instruction-driven demands. In this work, we propose a diffusion-based multi-head trajectory planner(M-diffusion planner). During the early training stage, all output heads share weights to learn to generate high-quality trajectories. Leveraging the probabilistic nature of diffusion models, we then apply Group Relative Policy Optimization (GRPO) to fine-tune the pre-trained model for diverse policy-specific behaviors. At inference time, we incorporate a large language model (LLM) to guide strategy selection, enabling dynamic, instruction-aware planning without switching models. Closed-loop simulation demonstrates that our post-trained planner retains strong planning capability while achieving state-of-the-art (SOTA) performance on the nuPlan val14 benchmark. Open-loop results further show that the generated trajectories exhibit clear diversity, effectively satisfying multi-modal driving behavior requirements. The code and related experiments will be released upon acceptance of the paper.

LGAug 12, 2025
GSMT: Graph Fusion and Spatiotemporal TaskCorrection for Multi-Bus Trajectory Prediction

Fan Ding, Hwa Hui Tew, Junn Yong Loo et al.

Accurate trajectory prediction for buses is crucial in intelligent transportation systems, particularly within urban environments. In developing regions where access to multimodal data is limited, relying solely on onboard GPS data remains indispensable despite inherent challenges. To address this problem, we propose GSMT, a hybrid model that integrates a Graph Attention Network (GAT) with a sequence-to-sequence Recurrent Neural Network (RNN), and incorporates a task corrector capable of extracting complex behavioral patterns from large-scale trajectory data. The task corrector clusters historical trajectories to identify distinct motion patterns and fine-tunes the predictions generated by the GAT and RNN. Specifically, GSMT fuses dynamic bus information and static station information through embedded hybrid networks to perform trajectory prediction, and applies the task corrector for secondary refinement after the initial predictions are generated. This two-stage approach enables multi-node trajectory prediction among buses operating in dense urban traffic environments under complex conditions. Experiments conducted on a real-world dataset from Kuala Lumpur, Malaysia, demonstrate that our method significantly outperforms existing approaches, achieving superior performance in both short-term and long-term trajectory prediction tasks.