Adrian G. Bors

CV
h-index29
23papers
418citations
Novelty49%
AI Score50

23 Papers

CVOct 12, 2022
Task-Free Continual Learning via Online Discrepancy Distance Learning

Fei Ye, Adrian G. Bors

Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information. Although recently some methods have been proposed for TFCL, they lack theoretical guarantees. Moreover, forgetting analysis during TFCL was not studied theoretically before. This paper develops a new theoretical analysis framework which provides generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis gives new insights into the forgetting behaviour in classification tasks. Inspired by this theoretical model, we propose a new approach enabled by the dynamic component expansion mechanism for a mixture model, namely the Online Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy between the probabilistic representation of the current memory buffer and the already accumulated knowledge and uses it as the expansion signal to ensure a compact network architecture with optimal performance. We then propose a new sample selection approach that selectively stores the most relevant samples into the memory buffer through the discrepancy-based measure, further improving the performance. We perform several TFCL experiments with the proposed methodology, which demonstrate that the proposed approach achieves the state of the art performance.

LGJul 20, 2022
Continual Variational Autoencoder Learning via Online Cooperative Memorization

Fei Ye, Adrian G. Bors

Due to their inference, data representation and reconstruction properties, Variational Autoencoders (VAE) have been successfully used in continual learning classification tasks. However, their ability to generate images with specifications corresponding to the classes and databases learned during Continual Learning (CL) is not well understood and catastrophic forgetting remains a significant challenge. In this paper, we firstly analyze the forgetting behaviour of VAEs by developing a new theoretical framework that formulates CL as a dynamic optimal transport problem. This framework proves approximate bounds to the data likelihood without requiring the task information and explains how the prior knowledge is lost during the training process. We then propose a novel memory buffering approach, namely the Online Cooperative Memorization (OCM) framework, which consists of a Short-Term Memory (STM) that continually stores recent samples to provide future information for the model, and a Long-Term Memory (LTM) aiming to preserve a wide diversity of samples. The proposed OCM transfers certain samples from STM to LTM according to the information diversity selection criterion without requiring any supervised signals. The OCM framework is then combined with a dynamic VAE expansion mixture network for further enhancing its performance.

LGJul 11, 2022
Learning an evolved mixture model for task-free continual learning

Fei Ye, Adrian G. Bors

Recently, continual learning (CL) has gained significant interest because it enables deep learning models to acquire new knowledge without forgetting previously learnt information. However, most existing works require knowing the task identities and boundaries, which is not realistic in a real context. In this paper, we address a more challenging and realistic setting in CL, namely the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information. To address TFCL, we introduce an evolved mixture model whose network architecture is dynamically expanded to adapt to the data distribution shift. We implement this expansion mechanism by evaluating the probability distance between the knowledge stored in each mixture model component and the current memory buffer using the Hilbert Schmidt Independence Criterion (HSIC). We further introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload while preserving memory diversity. Empirical results demonstrate that the proposed approach achieves excellent performance.

CVSep 25, 2023
Masked Image Residual Learning for Scaling Deeper Vision Transformers

Guoxi Huang, Hongtao Fu, Adrian G. Bors

Deeper Vision Transformers (ViTs) are more challenging to train. We expose a degradation problem in deeper layers of ViT when using masked image modeling (MIM) for pre-training. To ease the training of deeper ViTs, we introduce a self-supervised learning framework called Masked Image Residual Learning (MIRL), which significantly alleviates the degradation problem, making scaling ViT along depth a promising direction for performance upgrade. We reformulate the pre-training objective for deeper layers of ViT as learning to recover the residual of the masked image. We provide extensive empirical evidence showing that deeper ViTs can be effectively optimized using MIRL and easily gain accuracy from increased depth. With the same level of computational complexity as ViT-Base and ViT-Large, we instantiate 4.5$\times$ and 2$\times$ deeper ViTs, dubbed ViT-S-54 and ViT-B-48. The deeper ViT-S-54, costing 3$\times$ less than ViT-Large, achieves performance on par with ViT-Large. ViT-B-48 achieves 86.2% top-1 accuracy on ImageNet. On one hand, deeper ViTs pre-trained with MIRL exhibit excellent generalization capabilities on downstream tasks, such as object detection and semantic segmentation. On the other hand, MIRL demonstrates high pre-training efficiency. With less pre-training time, MIRL yields competitive performance compared to other approaches.

CVNov 23, 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Guoxi Huang, Adrian G. Bors

Static appearance of video may impede the ability of a deep neural network to learn motion-relevant features in video action recognition. In this paper, we introduce a new concept, Dynamic Appearance (DA), summarizing the appearance information relating to movement in a video while filtering out the static information considered unrelated to motion. We consider distilling the dynamic appearance from raw video data as a means of efficient video understanding. To this end, we propose the Pixel-Wise Temporal Projection (PWTP), which projects the static appearance of a video into a subspace within its original vector space, while the dynamic appearance is encoded in the projection residual describing a special motion pattern. Moreover, we integrate the PWTP module with a CNN or Transformer into an end-to-end training framework, which is optimized by utilizing multi-objective optimization algorithms. We provide extensive experimental results on four action recognition benchmarks: Kinetics400, Something-Something V1, UCF101 and HMDB51.

CVJan 20
Two-Stream temporal transformer for video action classification

Nattapong Kurpukdee, Adrian G. Bors

Motion representation plays an important role in video understanding and has many applications including action recognition, robot and autonomous guidance or others. Lately, transformer networks, through their self-attention mechanism capabilities, have proved their efficiency in many applications. In this study, we introduce a new two-stream transformer video classifier, which extracts spatio-temporal information from content and optical flow representing movement information. The proposed model identifies self-attention features across the joint optical flow and temporal frame domain and represents their relationships within the transformer encoder mechanism. The experimental results show that our proposed methodology provides excellent classification results on three well-known video datasets of human activities.

CVJan 20
Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management

Nattapong Kurpukdee, Adrian G. Bors

Unsupervised video class incremental learning (uVCIL) represents an important learning paradigm for learning video information without forgetting, and without considering any data labels. Prior approaches have focused on supervised class-incremental learning, relying on using the knowledge of labels and task boundaries, which is costly, requires human annotation, or is simply not a realistic option. In this paper, we propose a simple yet effective approach to address the uVCIL. We first consider a deep feature extractor network, providing a set of representative video features during each task without assuming any class or task information. We then progressively build a series of deep clusters from the extracted features. During the successive task learning, the model updated from the previous task is used as an initial state in order to transfer knowledge to the current learning task. We perform in-depth evaluations on three standard video action recognition datasets, including UCF101, HMDB51, and Something-to-Something V2, by ignoring the labels from the supervised setting. Our approach significantly outperforms other baselines on all datasets.

CVDec 25, 2025
Inference-based GAN Video Generation

Jingbo Yang, Adrian G. Bors

Video generation has seen remarkable progress thanks to advancements in generative deep learning. However, generating long sequences remains a significant challenge. Generated videos should not only display coherent and continuous movement but also meaningful movement in successions of scenes. Models such as GANs, VAEs, and Diffusion Networks have been used for generating short video sequences, typically up to 16 frames. In this paper, we first propose a new type of video generator by enabling adversarial-based unconditional video generators with a variational encoder, akin to a VAE-GAN hybrid structure. The proposed model, as in other video deep learning-based processing frameworks, incorporates two processing branches, one for content and another for movement. However, existing models struggle with the temporal scaling of the generated videos. Classical approaches often result in degraded video quality when attempting to increase the generated video length, especially for significantly long sequences. To overcome this limitation, our research study extends the initially proposed VAE-GAN video generation model by employing a novel, memory-efficient approach to generate long videos composed of hundreds or thousands of frames ensuring their temporal continuity, consistency and dynamics. Our approach leverages a Markov chain framework with a recall mechanism, where each state represents a short-length VAE-GAN video generator. This setup enables the sequential connection of generated video sub-sequences, maintaining temporal dependencies and resulting in meaningful long video sequences.

LGMar 25, 2022Code
Supplemental Material: Lifelong Generative Modelling Using Dynamic Expansion Graph Model

Fei Ye, Adrian G. Bors

In this article, we provide the appendix for Lifelong Generative Modelling Using Dynamic Expansion Graph Model. This appendix includes additional visual results as well as the numerical results on the challenging datasets. In addition, we also provide detailed proofs for the proposed theoretical analysis framework. The source code can be found in https://github.com/dtuzi123/Expansion-Graph-Model.

LGDec 15, 2021Code
Lifelong Generative Modelling Using Dynamic Expansion Graph Model

Fei Ye, Adrian G. Bors

Variational Autoencoders (VAEs) suffer from degenerated performance, when learning several successive tasks. This is caused by catastrophic forgetting. In order to address the knowledge loss, VAEs are using either Generative Replay (GR) mechanisms or Expanding Network Architectures (ENA). In this paper we study the forgetting behaviour of VAEs using a joint GR and ENA methodology, by deriving an upper bound on the negative marginal log-likelihood. This theoretical analysis provides new insights into how VAEs forget the previously learnt knowledge during lifelong learning. The analysis indicates the best performance achieved when considering model mixtures, under the ENA framework, where there are no restrictions on the number of components. However, an ENA-based approach may require an excessive number of parameters. This motivates us to propose a novel Dynamic Expansion Graph Model (DEGM). DEGM expands its architecture, according to the novelty associated with each new databases, when compared to the information already learnt by the network from previous tasks. DEGM training optimizes knowledge structuring, characterizing the joint probabilistic representations corresponding to the past and more recently learned tasks. We demonstrate that DEGM guarantees optimal performance for each task while also minimizing the required number of parameters. Supplementary materials (SM) and source code are available in https://github.com/dtuzi123/Expansion-Graph-Model.

LGAug 25, 2021Code
Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process

Fei Ye, Adrian G. Bors

Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks. The proposed methodology shows promising results in overcoming catastrophic forgetting. However, the theory behind these successful models is still not well understood. In this paper, we perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data generated by the model and that corresponding to the target dataset. Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model, which can automatically expand its network architectures or choose an appropriate component to adapt its parameters for learning a new task, while preserving its previously learnt information. We propose to incorporate the knowledge by means of Dirichlet processes by using a gating mechanism which computes the dependence between the knowledge learnt previously and stored in each component, and a new set of data. Besides, we train a compact Student model which can accumulate cross-domain representations over time and make quick inferences. The code is available at https://github.com/dtuzi123/Lifelong-infinite-mixture-model.

LGJul 9, 2021Code
Lifelong Teacher-Student Network Learning

Fei Ye, Adrian G. Bors

A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learn databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training. The code is available~: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}

CVAug 29, 2025
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering

Nattapong Kurpukdee, Adrian G. Bors

We propose a realistic scenario for the unsupervised video learning where neither task boundaries nor labels are provided when learning a succession of tasks. We also provide a non-parametric learning solution for the under-explored problem of unsupervised video continual learning. Videos represent a complex and rich spatio-temporal media information, widely used in many applications, but which have not been sufficiently explored in unsupervised continual learning. Prior studies have only focused on supervised continual learning, relying on the knowledge of labels and task boundaries, while having labeled data is costly and not practical. To address this gap, we study the unsupervised video continual learning (uVCL). uVCL raises more challenges due to the additional computational and memory requirements of processing videos when compared to images. We introduce a general benchmark experimental protocol for uVCL by considering the learning of unstructured video data categories during each task. We propose to use the Kernel Density Estimation (KDE) of deep embedded video features extracted by unsupervised video transformer networks as a non-parametric probabilistic representation of the data. We introduce a novelty detection criterion for the incoming new task data, dynamically enabling the expansion of memory clusters, aiming to capture new knowledge when learning a succession of tasks. We leverage the use of transfer learning from the previous tasks as an initial state for the knowledge transfer to the current learning task. We found that the proposed methodology substantially enhances the performance of the model when successively learning many tasks. We perform in-depth evaluations on three standard video action recognition datasets, including UCF101, HMDB51, and Something-to-Something V2, without using any labels or class boundaries.

CVJul 9, 2021
Lifelong Twin Generative Adversarial Networks

Fei Ye, Adrian G. Bors

In this paper, we propose a new continuously learning generative model, called the Lifelong Twin Generative Adversarial Networks (LT-GANs). LT-GANs learns a sequence of tasks from several databases and its architecture consists of three components: two identical generators, namely the Teacher and Assistant, and one Discriminator. In order to allow for the LT-GANs to learn new concepts without forgetting, we introduce a new lifelong training approach, namely Lifelong Adversarial Knowledge Distillation (LAKD), which encourages the Teacher and Assistant to alternately teach each other, while learning a new database. This training approach favours transferring knowledge from a more knowledgeable player to another player which knows less information about a previously given task.

LGJul 9, 2021
InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood

Fei Ye, Adrian G. Bors

Learning disentangled and interpretable representations is an important step towards accomplishing comprehensive data representations on the manifold. In this paper, we propose a novel representation learning algorithm which combines the inference abilities of Variational Autoencoders (VAE) with the generalization capability of Generative Adversarial Networks (GAN). The proposed model, called InfoVAEGAN, consists of three networks~: Encoder, Generator and Discriminator. InfoVAEGAN aims to jointly learn discrete and continuous interpretable representations in an unsupervised manner by using two different data-free log-likelihood functions onto the variables sampled from the generator's distribution. We propose a two-stage algorithm for optimizing the inference network separately from the generator training. Moreover, we enforce the learning of interpretable representations through the maximization of the mutual information between the existing latent variables and those created through generative and inference processes.

LGJul 9, 2021
Lifelong Mixture of Variational Autoencoders

Fei Ye, Adrian G. Bors

In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

CVMar 29, 2021
Busy-Quiet Video Disentangling for Video Classification

Guoxi Huang, Adrian G. Bors

In video data, busy motion details from moving regions are conveyed within a specific frequency bandwidth in the frequency domain. Meanwhile, the rest of the frequencies of video data are encoded with quiet information with substantial redundancy, which causes low processing efficiency in existing video models that take as input raw RGB frames. In this paper, we consider allocating intenser computation for the processing of the important busy information and less computation for that of the quiet information. We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data. By embedding the MBPM into a two-pathway CNN architecture, we define a Busy-Quiet Net (BQN). The efficiency of BQN is determined by avoiding redundancy in the feature space processed by the two pathways: one operating on Quiet features of low-resolution, while the other processes Busy features. The proposed BQN outperforms many recent video processing models on Something-Something V1, Kinetics400, UCF101 and HMDB51 datasets.

CVJul 20, 2020
Learning latent representations across multiple data domains using Lifelong VAEGAN

Fei Ye, Adrian G. Bors

The problem of catastrophic forgetting occurs in deep learning models trained on multiple databases in a sequential manner. Recently, generative replay mechanisms (GRM), have been proposed to reproduce previously learned knowledge aiming to reduce the forgetting. However, such approaches lack an appropriate inference model and therefore can not provide latent representations of data. In this paper, we propose a novel lifelong learning approach, namely the Lifelong VAEGAN (L-VAEGAN), which not only induces a powerful generative replay network but also learns meaningful latent representations, benefiting representation learning. L-VAEGAN can allow to automatically embed the information associated with different domains into several clusters in the latent space, while also capturing semantically meaningful shared latent variables, across different data domains. The proposed model supports many downstream tasks that traditional generative replay methods can not, including interpolation and inference across different data domains.

CVJul 17, 2020
Region-based Non-local Operation for Video Classification

Guoxi Huang, Adrian G. Bors

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.

CVMay 6, 2020
Generating Memorable Images Based on Human Visual Memory Schemas

Cameron Kyle-Davidson, Adrian G. Bors, Karla K. Evans

This research study proposes using Generative Adversarial Networks (GAN) that incorporate a two-dimensional measure of human memorability to generate memorable or non-memorable images of scenes. The memorability of the generated images is evaluated by modelling Visual Memory Schemas (VMS), which correspond to mental representations that human observers use to encode an image into memory. The VMS model is based upon the results of memory experiments conducted on human observers, and provides a 2D map of memorability. We impose a memorability constraint upon the latent space of a GAN by employing a VMS map prediction model as an auxiliary loss. We assess the difference in memorability between images generated to be memorable or non-memorable through an independent computational measure of memorability, and additionally assess the effect of memorability on the realness of the generated images.

CVFeb 11, 2020
Learning spatio-temporal representations with temporal squeeze pooling

Guoxi Huang, Adrian G. Bors

In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images, named Squeezed Images. By embedding the Temporal Squeeze pooling as a layer into off-the-shelf Convolution Neural Networks (CNN), we design a new video classification model, named Temporal Squeeze Network (TeSNet). The resulting Squeezed Images contain the essential movement information from the video frames, corresponding to the optimization of the video classification task. We evaluate our architecture on two video classification benchmarks, and the results achieved are compared to the state-of-the-art.

CVMar 5, 2019
Defining Image Memorability using the Visual Memory Schema

Erdem Akagunduz, Adrian G. Bors, Karla K. Evans

Memorability of an image is a characteristic determined by the human observers' ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. {The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations.} We propose a new concept called the Visual Memory Schema (VMS) referring to an organisation of image components human observers share when encoding and recognising images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers.

CRFeb 23, 2017
Steganalysis of 3D Objects Using Statistics of Local Feature Sets

Zhenyu Li, Adrian G. Bors

3D steganalysis aims to identify subtle invisible changes produced in graphical objects through digital watermarking or steganography. Sets of statistical representations of 3D features, extracted from both cover and stego 3D mesh objects, are used as inputs into machine learning classifiers in order to decide whether any information was hidden in the given graphical object. According to previous studies, sets of local geometry features can be used to define the differences between stego and cover-objects. The features proposed in this paper include those representing the local object curvature, vertex normals, the local geometry representation in the spherical coordinate system and are considered in various combinations with others. We also analyze the effectiveness of various 3D feature sets applied for steganalysis based on the Pearson correlation coefficient. The classifiers proposed in this study for discriminating the 3D stego and cover-objects include Support Vector Machine and the Fisher Linear Discriminant ensemble. Three different watermarking and steganographic methods are used for hiding information in the 3D objects used for testing the performance of the proposed steganalysis methodology.