LGFeb 14, 2023Code
A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder IdentificationSin-Yee Yap, Junn Yong Loo, Chee-Ming Ting et al.
Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.
LGJul 21, 2024
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative ModellingJunn Yong Loo, Michelle Adeline, Arghya Pal et al.
Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MCMC sampling and does not rely on complementary latent models or cooperative training. The VAPO framework aims to learn a potential energy function whose gradient (flow) guides the prior samples, so that their density evolution closely follows an approximate data likelihood homotopy. An energy loss function is then formulated to minimize the Kullback-Leibler divergence between density evolution of the flow-driven prior and the data likelihood homotopy. Images can be generated after training the potential energy, by initializing the samples from Gaussian prior and solving the ODE governing the potential flow on a fixed time interval using generic ODE solvers. Experiment results show that the proposed VAPO framework is capable of generating realistic images on various image datasets. In particular, our proposed framework achieves competitive FID scores for unconditional image generation on the CIFAR-10 and CelebA datasets.
CVApr 9, 2024Code
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in VideosSharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy et al.
Human action or activity recognition in videos is a fundamental task in computer vision with applications in surveillance and monitoring, self-driving cars, sports analytics, human-robot interaction and many more. Traditional supervised methods require large annotated datasets for training, which are expensive and time-consuming to acquire. This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition. Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples. We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations; hence we call it ActNetFormer. The 3D CNNs excel at capturing spatial features and local dependencies in the temporal domain, while VIT excels at capturing long-range dependencies across frames. By integrating these complementary architectures within the ActNetFormer framework, our approach can effectively capture both local and global contextual information of an action. This comprehensive representation learning enables the model to achieve better performance in semi-supervised action recognition tasks by leveraging the strengths of each of these architectures. Experimental results on standard action recognition datasets demonstrate that our approach performs better than the existing methods, achieving state-of-the-art performance with only a fraction of labeled data. The official website of this work is available at: https://github.com/rana2149/ActNetFormer.
CVMar 18, 2022
Transferable Class-Modelling for Decentralized Source Attribution of GAN-Generated ImagesBrandon B. G. Khoo, Chern Hong Lim, Raphael C. -W. Phan
GAN-generated deepfakes as a genre of digital images are gaining ground as both catalysts of artistic expression and malicious forms of deception, therefore demanding systems to enforce and accredit their ethical use. Existing techniques for the source attribution of synthetic images identify subtle intrinsic fingerprints using multiclass classification neural nets limited in functionality and scalability. Hence, we redefine the deepfake detection and source attribution problems as a series of related binary classification tasks. We leverage transfer learning to rapidly adapt forgery detection networks for multiple independent attribution problems, by proposing a semi-decentralized modular design to solve them simultaneously and efficiently. Class activation mapping is also demonstrated as an effective means of feature localization for model interpretation. Our models are determined via experimentation to be competitive with current benchmarks, and capable of decent performance on human portraits in ideal conditions. Decentralized fingerprint-based attribution is found to retain validity in the presence of novel sources, but is more susceptible to type II errors that intensify with image perturbations and attributive uncertainty. We describe both our conceptual framework and model prototypes for further enhancement when investigating the technical limits of reactive deepfake attribution.
CVFeb 6, 2025Code
Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive LearningSharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy et al.
Action recognition in dark or low-light (under-exposed) videos is a challenging task due to visibility degradation, which can hinder critical spatiotemporal details. This paper proposes ActLumos, a teacher-student framework that attains single-stream inference while retaining multi-stream level accuracy. The teacher consumes dual stream inputs, which include original dark frames and retinex-enhanced frames, processed by weight-shared R(2+1)D-34 backbones and fused by a Dynamic Feature Fusion (DFF) module, which dynamically re-weights the two streams at each time step, emphasising the most informative temporal segments. The teacher is also included with a supervised contrastive loss (SupCon) that sharpens class margins. The student shares the R(2+1)D-34 backbone but uses only dark frames and no fusion at test time. The student is first pre-trained with self-supervision on dark clips of both datasets without their labels and then fine-tuned with knowledge distillation from the teacher, transferring the teacher's multi-stream knowledge into a single-stream model. Under single-stream inference, the distilled student attains state-of-the-art accuracy of 96.92% (Top-1) on ARID V1.0, 88.27% on ARID V1.5, and 48.96% on Dark48. Ablation studies further highlight the individual contributions of each component, i.e., DFF in the teacher outperforms single or static fusion, knowledge distillation (KD) transfers these gains to the single-stream student, and two-view spatio-temporal SSL surpasses spatial-only or temporal-only variants without increasing inference cost. The official website of this work is available at: https://github.com/HrishavBakulBarua/ActLumos
CVJan 13, 2025
SFC-GAN: A Generative Adversarial Network for Brain Functional and Structural Connectome TranslationYee-Fan Tan, Jun Lin Liow, Pei-Sze Tan et al.
Modern brain imaging technologies have enabled the detailed reconstruction of human brain connectomes, capturing structural connectivity (SC) from diffusion MRI and functional connectivity (FC) from functional MRI. Understanding the intricate relationships between SC and FC is vital for gaining deeper insights into the brain's functional and organizational mechanisms. However, obtaining both SC and FC modalities simultaneously remains challenging, hindering comprehensive analyses. Existing deep generative models typically focus on synthesizing a single modality or unidirectional translation between FC and SC, thereby missing the potential benefits of bi-directional translation, especially in scenarios where only one connectome is available. Therefore, we propose Structural-Functional Connectivity GAN (SFC-GAN), a novel framework for bidirectional translation between SC and FC. This approach leverages the CycleGAN architecture, incorporating convolutional layers to effectively capture the spatial structures of brain connectomes. To preserve the topological integrity of these connectomes, we employ a structure-preserving loss that guides the model in capturing both global and local connectome patterns while maintaining symmetry. Our framework demonstrates superior performance in translating between SC and FC, outperforming baseline models in similarity and graph property evaluations compared to ground truth data, each translated modality can be effectively utilized for downstream classification.
LGSep 25, 2025
T2I-Diff: fMRI Signal Generation via Time-Frequency Image Transform and Classifier-Free Denoising Diffusion ModelsHwa Hui Tew, Junn Yong Loo, Yee-Fan Tan et al.
Functional Magnetic Resonance Imaging (fMRI) is an advanced neuroimaging method that enables in-depth analysis of brain activity by measuring dynamic changes in the blood oxygenation level-dependent (BOLD) signals. However, the resource-intensive nature of fMRI data acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often underperform because they overlook the complex non-stationarity and nonlinear BOLD dynamics. To address these challenges, we introduce T2I-Diff, an fMRI generation framework that leverages time-frequency representation of BOLD signals and classifier-free denoising diffusion. Specifically, our framework first converts BOLD signals into windowed spectrograms via a time-dependent Fourier transform, capturing both the underlying temporal dynamics and spectral evolution. Subsequently, a classifier-free diffusion model is trained to generate class-conditioned frequency spectrograms, which are then reverted to BOLD signals via inverse Fourier transforms. Finally, we validate the efficacy of our approach by demonstrating improved accuracy and generalization in downstream fMRI-based brain network classification.
NCJul 27, 2021
Graph Autoencoders for Embedding Learning in Brain Networks and Major Depressive Disorder IdentificationFuad Noman, Chee-Ming Ting, Hakmook Kang et al.
Brain functional connectivity (FC) reveals biomarkers for identification of various neuropsychiatric disorders. Recent application of deep neural networks (DNNs) to connectome-based classification mostly relies on traditional convolutional neural networks using input connectivity matrices on a regular Euclidean grid. We propose a graph deep learning framework to incorporate the non-Euclidean information about graph structure for classifying functional magnetic resonance imaging (fMRI)-derived brain networks in major depressive disorder (MDD). We design a novel graph autoencoder (GAE) architecture based on the graph convolutional networks (GCNs) to embed the topological structure and node content of large-sized fMRI networks into low-dimensional latent representations. In network construction, we employ the Ledoit-Wolf (LDW) shrinkage method to estimate the high-dimensional FC metrics efficiently from fMRI data. We consider both supervised and unsupervised approaches for the graph embedding learning. The learned embeddings are then used as feature inputs for a deep fully-connected neural network (FCNN) to discriminate MDD from healthy controls. Evaluated on two resting-state fMRI (rs-fMRI) MDD datasets, results show that the proposed GAE-FCNN model significantly outperforms several state-of-the-art methods for brain connectome classification, achieving the best accuracy using the LDW-FC edges as node features. The graph embeddings of fMRI FC networks learned by the GAE also reveal apparent group differences between MDD and HC. Our new framework demonstrates feasibility of learning graph embeddings on brain networks to provide discriminative information for diagnosis of brain disorders.
CVJun 6, 2016
Less is More: Micro-expression Recognition from Video using Apex FrameSze-Teng Liong, John See, KokSheik Wong et al.
Despite recent interest and advances in facial micro-expression research, there is still plenty room for improvement in terms of micro-expression recognition. Conventional feature extraction approaches for micro-expression video consider either the whole video sequence or a part of it, for representation. However, with the high-speed video capture of micro-expressions (100-200 fps), are all frames necessary to provide a sufficiently meaningful representation? Is the luxury of data a bane to accurate recognition? A novel proposition is presented in this paper, whereby we utilize only two images per video: the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on five micro-expression databases: CAS(ME)$^2$, CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with our proposed technique achieving a state-of-the-art F1-score recognition performance of 61% and 62% in the high frame rate CASME II and SMIC-HS databases respectively.
CROct 15, 2014
Higher Order Differentiation over Finite Fields with Applications to Generalising the Cube AttackAna Sălăgean, Matei Mandache-Sălăgean, Richard Winter et al.
Higher order differentiation was introduced in a cryptographic context by Lai. Several attacks can be viewed in the context of higher order differentiations, amongst them the cube attack and the AIDA attack. All of the above have been developed for the binary case. We examine differentiation in larger fields, starting with the field $GF(p)$ of integers modulo a prime $p$. We prove a number of results on differentiating polynomials over such fields and then apply these techniques to generalising the cube attack to $GF(p)$. The crucial difference is that now the degree in each variable can be higher than one, and our proposed attack will differentiate several times with respect to each variable (unlike the classical cube attack and its larger field version described by Dinur and Shamir, both of which differentiate at most once with respect to each variable). Finally we describe differentiation over finite fields $GF(p^m)$ with $p^m$ elements and prove that it can be reduced to differentiation over $GF(p)$, so a cube attack over $GF(p^m)$ would be equivalent to cube attacks over $GF(p)$.
CVMar 14, 2014
Spontaneous expression classification in the encrypted domainSegun Aina, Yogachandran Rahulamathavan, Raphael C. -W. Phan et al.
To date, most facial expression analysis have been based on posed image databases and is carried out without being able to protect the identity of the subjects whose expressions are being recognised. In this paper, we propose and implement a system for classifying facial expressions of images in the encrypted domain based on a Paillier cryptosystem implementation of Fisher Linear Discriminant Analysis and k-nearest neighbour (FLDA + kNN). We present results of experiments carried out on a recently developed natural visible and infrared facial expression (NVIE) database of spontaneous images. To the best of our knowledge, this is the first system that will allow the recog-nition of encrypted spontaneous facial expressions by a remote server on behalf of a client.