ROMar 3, 2023Code
Cross-domain Transfer Learning and State Inference for Soft Robots via a Semi-supervised Sequential Variational Bayes FrameworkShageenderan Sapai, Junn Yong Loo, Ze Yang Ding et al.
Recently, data-driven models such as deep neural networks have shown to be promising tools for modelling and state inference in soft robots. However, voluminous amounts of data are necessary for deep models to perform effectively, which requires exhaustive and quality data collection, particularly of state labels. Consequently, obtaining labelled state data for soft robotic systems is challenged for various reasons, including difficulty in the sensorization of soft robots and the inconvenience of collecting data in unstructured environments. To address this challenge, in this paper, we propose a semi-supervised sequential variational Bayes (DSVB) framework for transfer learning and state inference in soft robots with missing state labels on certain robot configurations. Considering that soft robots may exhibit distinct dynamics under different robot configurations, a feature space transfer strategy is also incorporated to promote the adaptation of latent features across multiple configurations. Unlike existing transfer learning approaches, our proposed DSVB employs a recurrent neural network to model the nonlinear dynamics and temporal coherence in soft robot data. The proposed framework is validated on multiple setup configurations of a pneumatic-based soft robot finger. Experimental results on four transfer scenarios demonstrate that DSVB performs effective transfer learning and accurate state inference amidst missing state labels. The data and code are available at https://github.com/shageenderan/DSVB.
CVFeb 22, 2023
HDR image watermarking using saliency detection and quantization index modulationAhmed Khan, Minoru Kuribayashi, KokSheik Wong et al.
High-dynamic range (HDR) images are circulated rapidly over the internet with risks of being exploited for unauthorized usage. To protect these images, some HDR image based watermarking (HDR-IW) methods were put forward. However, they inherited the same problem faced by conventional IW methods for standard dynamic range (SDR) images, where only trade-offs among conflicting requirements are managed instead of simultaneous improvement. In this paper, a novel saliency (eye-catching object) detection based trade-off independent HDR-IW is proposed, to simultaneously improve robustness, imperceptibility and payload. First, the host image goes through our proposed salient object detection model to produce a saliency map, which is, in turn, exploited to segment the foreground and background of the host image. Next, the binary watermark is partitioned into the foregrounds and backgrounds using the same mask and scrambled using a random permutation algorithm. Finally, the watermark segments are embedded into selected bit-plane of the corresponding host segments using quantized indexed modulation. Experimental results suggest that the proposed work outperforms state-of-the-art methods in terms of improving the conflicting requirements.
LGJun 8, 2023
A Deep Probabilistic Flow-Based Framework for Unsupervised Cross-Domain Soft SensingJunn Yong Loo, Hwa Hui Tew, Fang Yu Leong et al.
Industrial soft sensing is crucial for accurate process monitoring through reliable inference of dominant sensor variables. However, developing effective data-driven soft sensor models presents challenges, such as achieving domain adaptability, addressing incomplete sensor labels, and learning stochastic data variability. To overcome these challenges, we propose a Deep Variational Potential Flow (DVPF) framework for cross-domain soft sensor modeling, taking into account the lack of sensor labels in the target domain. Our framework introduces sequential variational Bayes with recurrent neural network (RNN) parameterization to address the maximum likelihood estimation problem that characterizes cross-domain soft sensing. Central to the framework is a potential flow that performs unsupervised Bayesian inference on the RNN-extracted features to obtain an exact representation of the intractable posterior distribution. Together, these DVPF components learn domain-adaptable features that effectively capture complex cross-domain process dynamics and data variability. We validate the proposed DVPF on a real industrial multiphase flow process across varying operating modes. The results show that the DVPF demonstrates superior performance in cross-domain soft sensing compared to existing deep feature-based domain adaptation methods.
SDMay 18
Sonalyzer-Moz: A Framework for Analyzing the Structure of Mozart's Sonata FormJing Zhao, KokSheik Wong, Vishnu Monn Baskaran et al.
The sonata form is a musically rich and hierarchically structured form that poses significant challenges for automatic analysis. While music structure analysis has seen strides of progress in recent years, sonata form analysis remains in its early stages. This is largely due to the time-consuming and high barrier of the music background requirement for annotating classical music structures. To advance research in this area, we curated SoSA-Moz, the first large-scale dataset featuring comprehensive hierarchical structure annotations. This work establishes a foundation for systematic sonata form analysis. Leveraging this newly contributed resource, we further propose Sonalyzer-Moz, a baseline model specifically designed for investigating complex sonata structures. This framework integrates feature aggregation with sequential modeling, enabling it to capture both local feature and upper-level structural dependencies. Experiment results show that Sonalyzer-Moz is capable of identifying the components' boundaries of the upper-level structure that are critical to understanding sonata form. Therefore, this method demonstrates, for the first time, the effectiveness of automatic upper-level analysis of sonata form, and provides a robust baseline for future research in the automatic understanding of sonata form while advancing the study of classical music structure analysis.
LGJul 21, 2024
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative ModellingJunn Yong Loo, Michelle Adeline, Arghya Pal et al.
Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MCMC sampling and does not rely on complementary latent models or cooperative training. The VAPO framework aims to learn a potential energy function whose gradient (flow) guides the prior samples, so that their density evolution closely follows an approximate data likelihood homotopy. An energy loss function is then formulated to minimize the Kullback-Leibler divergence between density evolution of the flow-driven prior and the data likelihood homotopy. Images can be generated after training the potential energy, by initializing the samples from Gaussian prior and solving the ODE governing the potential flow on a fixed time interval using generic ODE solvers. Experiment results show that the proposed VAPO framework is capable of generating realistic images on various image datasets. In particular, our proposed framework achieves competitive FID scores for unconditional image generation on the CIFAR-10 and CelebA datasets.
SYJun 21, 2023
Sigma-point Kalman Filter with Nonlinear Unknown Input Estimation via Optimization and Data-driven Approach for Dynamic SystemsJunn Yong Loo, Ze Yang Ding, Vishnu Monn Baskaran et al.
Most works on joint state and unknown input (UI) estimation require the assumption that the UIs are linear; this is potentially restrictive as it does not hold in many intelligent autonomous systems. To overcome this restriction and circumvent the need to linearize the system, we propose a derivative-free Unknown Input Sigma-point Kalman Filter (SPKF-nUI) where the SPKF is interconnected with a general nonlinear UI estimator that can be implemented via nonlinear optimization and data-driven approaches. The nonlinear UI estimator uses the posterior state estimate which is less susceptible to state prediction error. In addition, we introduce a joint sigma-point transformation scheme to incorporate both the state and UI uncertainties in the estimation of SPKF-nUI. An in-depth stochastic stability analysis proves that the proposed SPKF-nUI yields exponentially converging estimation error bounds under reasonable assumptions. Finally, two case studies are carried out on a simulation-based rigid robot and a physical soft robot, i.e., robots made of soft materials with complex dynamics to validate effectiveness of the proposed filter on nonlinear dynamic systems. Our results demonstrate that the proposed SPKF-nUI achieves the lowest state and UI estimation errors when compared to the existing nonlinear state-UI filters.
CVJul 15, 2025Code
SpaRTAN: Spatial Reinforcement Token-based Aggregation Network for Visual RecognitionQuan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo et al.
The resurgence of convolutional neural networks (CNNs) in visual recognition tasks, exemplified by ConvNeXt, has demonstrated their capability to rival transformer-based architectures through advanced training methodologies and ViT-inspired design principles. However, both CNNs and transformers exhibit a simplicity bias, favoring straightforward features over complex structural representations. Furthermore, modern CNNs often integrate MLP-like blocks akin to those in transformers, but these blocks suffer from significant information redundancies, necessitating high expansion ratios to sustain competitive performance. To address these limitations, we propose SpaRTAN, a lightweight architectural design that enhances spatial and channel-wise information processing. SpaRTAN employs kernels with varying receptive fields, controlled by kernel size and dilation factor, to capture discriminative multi-order spatial features effectively. A wave-based channel aggregation module further modulates and reinforces pixel interactions, mitigating channel-wise redundancies. Combining the two modules, the proposed network can efficiently gather and dynamically contextualize discriminative features. Experimental results in ImageNet and COCO demonstrate that SpaRTAN achieves remarkable parameter efficiency while maintaining competitive performance. In particular, on the ImageNet-1k benchmark, SpaRTAN achieves 77. 7% accuracy with only 3.8M parameters and approximately 1.0 GFLOPs, demonstrating its ability to deliver strong performance through an efficient design. On the COCO benchmark, it achieves 50.0% AP, surpassing the previous benchmark by 1.2% with only 21.5M parameters. The code is publicly available at [https://github.com/henry-pay/SpaRTAN].
CVJul 15, 2025Code
Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction DetectionQuan Bi Pay, Vishnu Monn Baskaran, Junn Yong Loo et al.
Human-object interaction (HOI) detection is essential for accurately localizing and characterizing interactions between humans and objects, providing a comprehensive understanding of complex visual scenes across various domains. However, existing HOI detectors often struggle to deliver reliable predictions efficiently, relying on resource-intensive training methods and inefficient architectures. To address these challenges, we conceptualize a wavelet attention-like backbone and a novel ray-based encoder architecture tailored for HOI detection. Our wavelet backbone addresses the limitations of expressing middle-order interactions by aggregating discriminative features from the low- and high-order interactions extracted from diverse convolutional filters. Concurrently, the ray-based encoder facilitates multi-scale attention by optimizing the focus of the decoder on relevant regions of interest and mitigating computational overhead. As a result of harnessing the attenuated intensity of learnable ray origins, our decoder aligns query embeddings with emphasized regions of interest for accurate predictions. Experimental results on benchmark datasets, including ImageNet and HICO-DET, showcase the potential of our proposed architecture. The code is publicly available at [https://github.com/henry-pay/RayEncoder].
ROJun 25, 2024Code
MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object DetectionMichelle Adeline, Junn Yong Loo, Vishnu Monn Baskaran
Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-$k$ anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4\% mAP and 55.0\% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline where anchor proposals are modelled as learnable embeddings. Code is available at https://github.com/NaomiEX/MDHA.
LGApr 22, 2025
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy MatchingJunn Yong Loo, Michelle Adeline, Julia Kaiwen Lau et al.
Energy-based models (EBMs) are a powerful class of probabilistic generative models due to their flexibility and interpretability. However, relationships between potential flows and explicit EBMs remain underexplored, while contrastive divergence training via implicit Markov chain Monte Carlo (MCMC) sampling is often unstable and expensive in high-dimensional settings. In this paper, we propose Variational Potential Flow Bayes (VPFB), a new energy-based generative framework that eliminates the need for implicit MCMC sampling and does not rely on auxiliary networks or cooperative training. VPFB learns an energy-parameterized potential flow by constructing a flow-driven density homotopy that is matched to the data distribution through a variational loss minimizing the Kullback-Leibler divergence between the flow-driven and marginal homotopies. This principled formulation enables robust and efficient generative modeling while preserving the interpretability of EBMs. Experimental results on image generation, interpolation, out-of-distribution detection, and compositional generation confirm the effectiveness of VPFB, showing that our method performs competitively with existing approaches in terms of sample quality and versatility across diverse generative modeling tasks.
CVJun 15, 2018
A Survey of Automatic Facial Micro-expression Analysis: Databases, Methods and ChallengesYee-Hui Oh, John See, Anh Cat Le Ngo et al.
Over the last few years, automatic facial micro-expression analysis has garnered increasing attention from experts across different disciplines because of its potential applications in various fields such as clinical diagnosis, forensic investigation and security systems. Advances in computer algorithms and video acquisition technology have rendered machine analysis of facial micro-expressions possible today, in contrast to decades ago when it was primarily the domain of psychiatrists where analysis was largely manual. Indeed, although the study of facial micro-expressions is a well-established field in psychology, it is still relatively new from the computational perspective with many interesting problems. In this survey, we present a comprehensive review of state-of-the-art databases and methods for micro-expressions spotting and recognition. Individual stages involved in the automation of these tasks are also described and reviewed at length. In addition, we also deliberate on the challenges and future directions in this growing field of automatic facial micro-expression analysis.