CVAug 18, 2023Code
NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual LearningTamasha Malepathirana, Damith Senanayake, Saman Halgamuge
Catastrophic forgetting; the loss of old knowledge upon acquiring new knowledge, is a pitfall faced by deep neural networks in real-world applications. Many prevailing solutions to this problem rely on storing exemplars (previously encountered data), which may not be feasible in applications with memory limitations or privacy constraints. Therefore, the recent focus has been on Non-Exemplar based Class Incremental Learning (NECIL) where a model incrementally learns about new classes without using any past exemplars. However, due to the lack of old data, NECIL methods struggle to discriminate between old and new classes causing their feature representations to overlap. We propose NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization, a framework that reduces this class overlap in NECIL. We draw inspiration from Neural Gas to learn the topological relationships in the feature space, identifying the neighboring classes that are most likely to get confused with each other. This neighborhood information is utilized to enforce strong separation between the neighboring classes as well as to generate old class representative prototypes that can better aid in obtaining a discriminative decision boundary between old and new classes. Our comprehensive experiments on CIFAR-100, TinyImageNet, and ImageNet-Subset demonstrate that NAPA-VQ outperforms the State-of-the-art NECIL methods by an average improvement of 5%, 2%, and 4% in accuracy and 10%, 3%, and 9% in forgetting respectively. Our code can be found in https://github.com/TamashaM/NAPA-VQ.git.
CVMay 11, 2023Code
Undercover Deepfakes: Detecting Fake Segments in VideosSanjay Saha, Rashindrie Perera, Sachith Seneviratne et al.
The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such generative techniques creates a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth. This paradigm has been under-explored by the current deepfake detection methods in the academic literature. In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels. To facilitate testing our method, we prepared a new benchmark dataset where videos have both real and fake frame sequences with very subtle transitions. We provide a benchmark on the proposed dataset with our detection method which utilizes the Vision Transformer based on Scaling and Shifting to learn spatial features, and a Timeseries Transformer to learn temporal features of the videos to help facilitate the interpretation of possible deepfakes. Extensive experiments on a variety of deepfake generation methods show excellent results by the proposed method on temporal segmentation and classical video-level predictions as well. In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes. All experiments can be reproduced at: github.com/rgb91/temporal-deepfake-segmentation.
LGJan 6, 2024
When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural NetworksHaihang Wu, Wei Wang, Tamasha Malepathirana et al.
Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced by the chosen policy for growth timing. While this regularization effect may mitigate the overfitting risk of the model, it may lead to a notable accuracy drop when the model underfits. Yet, current approaches have not addressed this issue due to their lack of consideration of the regularization effect from neural growth. Motivated by these findings, we propose an under/over fitting risk-aware growth timing policy, which automatically adjusts the growth timing informed by the level of potential under/overfitting risks to address both risks. Comprehensive experiments conducted using CIFAR-10/100 and ImageNet datasets show that the proposed policy achieves accuracy improvements of up to 1.3% in models prone to underfitting while achieving similar accuracies in models suffering from overfitting compared to the existing methods.
SPMay 18, 2025
FRAME-C: A knowledge-augmented deep learning pipeline for classifying multi-electrode array electrophysiological signalsNisal Ranasinghe, Dzung Do-Ha, Simon Maksour et al.
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by motor neuron degeneration, with alterations in neural excitability serving as key indicators. Recent advancements in induced pluripotent stem cell (iPSC) technology have enabled the generation of human iPSC-derived neuronal cultures, which, when combined with multi-electrode array (MEA) electrophysiology, provide rich spatial and temporal electrophysiological data. Traditionally, MEA data is analyzed using handcrafted features based on potentially imperfect domain knowledge, which while useful may not fully capture all useful characteristics inherent in the data. Machine learning, particularly deep learning, has the potential to automatically learn relevant characteristics from raw data without solely relying on handcrafted feature extraction. However, handcrafted features remain critical for encoding domain knowledge and improving interpretability, especially with limited or noisy data. This study introduces FRAME-C, a knowledge-augmented machine learning pipeline that combines domain knowledge, raw spike waveform data, and deep learning techniques to classify MEA signals and identify ALS-specific phenotypes. FRAME-C leverages deep learning to learn important features from spike waveforms while incorporating handcrafted features such as spike amplitude, inter-spike interval, and spike duration, preserving key spatial and temporal information. We validate FRAME-C on both simulated and real MEA data from human iPSC-derived neuronal cultures, demonstrating superior performance over existing classification methods. FRAME-C shows over 11% improvement on real data and up to 25% on simulated data. We also show FRAME-C can evaluate handcrafted feature importance, providing insights into ALS phenotypes.
CVDec 10, 2024
TT-MPD: Test Time Model Pruning and DistillationHaihang Wu, Wei Wang, Tamasha Malepathirana et al.
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. However, access to the training data can be limited due to concerns such as data privacy and commercial confidentiality. Furthermore, with covariate shift (disparities between test and training data distributions), pruning and finetuning with training datasets can hinder the generalization of the pruned model to test data. To address these issues, pruning and finetuning the model with test time samples becomes essential. However, test-time model pruning and fine-tuning incur additional computation costs and slow down the model's prediction speed, thus posing efficiency issues. Existing pruning methods are not efficient enough for test time model pruning setting, since finetuning the pruned model is needed to evaluate the importance of removable components. To address this, we propose two variables to approximate the fine-tuned accuracy. We then introduce an efficient pruning method that considers the approximated finetuned accuracy and potential inference latency saving. To enhance fine-tuning efficiency, we propose an efficient knowledge distillation method that only needs to generate pseudo labels for a small set of finetuning samples one time, thereby reducing the expensive pseudo-label generation cost. Experimental results demonstrate that our method achieves a comparable or superior tradeoff between test accuracy and inference latency, with a 32% relative reduction in pruning and finetuning time compared to the best existing method.
CVJun 19, 2024
AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion ModelsKen Chen, Sachith Seneviratne, Wei Wang et al.
Animating stylized avatars with dynamic poses and expressions has attracted increasing attention for its broad range of applications. Previous research has made significant progress by training controllable generative models to synthesize animations based on reference characteristics, pose, and expression conditions. However, the mechanisms used in these methods to control pose and expression often inadvertently introduce unintended features from the target motion, while also causing a loss of expression-related details, particularly when applied to stylized animation. This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for animating stylized avatars. First, we propose a refined spatial conditioning approach by Facial Alignment to prevent the inclusion of identity characteristics from the target motion. Then, we introduce an Expression Adapter that incorporates additional cross-attention layers to address the potential loss of expression-related information. Our approach effectively preserves pose and expression from the target video while maintaining input image consistency. Extensive experiments demonstrate that our method achieves state-of-the-art results, showcasing superior image quality, preservation of reference features, and expression accuracy, particularly for out-of-domain animation across diverse styles, highlighting its versatility and strong generalization capabilities. This work aims to enhance the quality of virtual stylized animation for positive applications. To promote responsible use in virtual environments, we contribute to the advancement of detection for generative content by evaluating state-of-the-art detectors, highlighting potential areas for improvement, and suggesting solutions.