Santu Rana

LG
h-index40
70papers
1,071citations
Novelty56%
AI Score54

70 Papers

CVSep 21, 2022
Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Kien Do, Hung Le, Dung Nguyen et al.

Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases.

LGMar 15, 2022
Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization

Hung Tran-The, Sunil Gupta, Santu Rana et al.

The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty due to its simplicity and efficiency. Despite its popularity, the theoretical aspects of this algorithm have not been properly analyzed. In particular, whether in the noisy setting, the EI strategy with a standard incumbent converges is still an open question of the Gaussian process bandit optimization problem. We aim to answer this question by proposing a variant of EI with a standard incumbent defined via the GP predictive mean. We prove that our algorithm converges, and achieves a cumulative regret bound of $\mathcal O(γ_T\sqrt{T})$, where $γ_T$ is the maximum information gain between $T$ observations and the Gaussian process model. Based on this variant of EI, we further propose an algorithm called Improved GP-EI that converges faster than previous counterparts. In particular, our proposed variants of EI do not require the knowledge of the RKHS norm and the noise's sub-Gaussianity parameter as in previous works. Empirical validation in our paper demonstrates the effectiveness of our algorithms compared to several baselines.

LGMar 3, 2023
BO-Muse: A human expert and AI teaming framework for accelerated experimental design

Sunil Gupta, Alistair Shilton, Arun Kumar A et al.

In this paper we introduce BO-Muse, a new approach to human-AI teaming for the optimization of expensive black-box functions. Inspired by the intrinsic difficulty of extracting expert knowledge and distilling it back into AI models and by observations of human behavior in real-world experimental design, our algorithm lets the human expert take the lead in the experimental process. The human expert can use their domain expertise to its full potential, while the AI plays the role of a muse, injecting novelty and searching for areas of weakness to break the human out of over-exploitation induced by cognitive entrenchment. With mild assumptions, we show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone. We validate our algorithm using synthetic data and with human experts performing real-world experiments.

AIAug 21, 2023
LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying

Thommen George Karimpanal, Laknath Buddhika Semage, Santu Rana et al.

Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. For example, by observing a partial stack of cubes, LLMs can predict the correct sequence in which the remaining cubes should be stacked by extrapolating the observed patterns (e.g., cube sizes, colors or other attributes) in the partial stack. In this work, we introduce LaGR (Language-Guided Reinforcement learning), which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent, in order to subsequently guide the latter's training. However, as RL training is generally not sample-efficient, deploying this approach would inherently imply that the LLM be repeatedly queried for solutions; a process that can be expensive and infeasible. To address this issue, we introduce SEQ (sample efficient querying), where we simultaneously train a secondary RL agent to decide when the LLM should be queried for solutions. Specifically, we use the quality of the solutions emanating from the LLM as the reward to train this agent. We show that our proposed framework LaGR-SEQ enables more efficient primary RL training, while simultaneously minimizing the number of queries to the LLM. We demonstrate our approach on a series of tasks and highlight the advantages of our approach, along with its limitations and potential future research directions.

LGMay 13, 2022
Fast Conditional Network Compression Using Bayesian HyperNetworks

Phuoc Nguyen, Truyen Tran, Ky Le et al.

We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g. a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.

MLFeb 1, 2023
Gradient Descent in Neural Networks as Sequential Learning in RKBS

Alistair Shilton, Sunil Gupta, Santu Rana et al.

The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel {\em Banach} space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.

LGMar 7, 2023
Controlled Diversity with Preference : Towards Learning a Diverse Set of Desired Skills

Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

Autonomously learning diverse behaviors without an extrinsic reward signal has been a problem of interest in reinforcement learning. However, the nature of learning in such mechanisms is unconstrained, often resulting in the accumulation of several unusable, unsafe or misaligned skills. In order to avoid such issues and ensure the discovery of safe and human-aligned skills, it is necessary to incorporate humans into the unsupervised training process, which remains a largely unexplored research area. In this work, we propose Controlled Diversity with Preference (CDP), a novel, collaborative human-guided mechanism for an agent to learn a set of skills that is diverse as well as desirable. The key principle is to restrict the discovery of skills to those regions that are deemed to be desirable as per a preference model trained using human preference labels on trajectory pairs. We evaluate our approach on 2D navigation and Mujoco environments and demonstrate the ability to discover diverse, yet desirable skills.

CVJul 8, 2022
Defense Against Multi-target Trojan Attacks

Haripriya Harikumar, Santu Rana, Kien Do et al.

Adversarial attacks on deep learning-based models pose a significant threat to the current AI infrastructure. Among them, Trojan attacks are the hardest to defend against. In this paper, we first introduce a variation of the Badnet kind of attacks that introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. The former makes it more potent and the latter makes it extremely easy to carry out the attack in the physical space. The state-of-the-art Trojan detection methods fail with this threat model. To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers. We then propose a detection mechanism by measuring the transferability of such recovered triggers. A Trojan trigger will have very high transferability i.e. they make other images also go to the same class. We study many practical advantages of our attack method and then demonstrate the detection performance using a variety of image datasets. The experimental results show the superior detection performance of our method over the state-of-the-arts.

LGAug 1, 2023
Predictive Modeling through Hyper-Bayesian Optimization

Manisha Senadeera, Santu Rana, Sunil Gupta et al.

Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster. The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured by a score function and fed back, capturing how well the model helped convergence in the function space. The score function is derived in such a way that it neutralizes the effect of the moving nature of the BO in the function space, thus keeping the model selection problem stationary. This back and forth leads to quick convergence for both model selection and BO in the function space. In addition to improved sample efficiency, the framework outputs information about the black-box function. Convergence is proved, and experimental results show significant improvement compared to standard BO.

ROSep 8, 2023
ECoDe: A Sample-Efficient Method for Co-Design of Robotic Agents

Kishan R. Nagiredla, Buddhika L. Semage, Arun Kumar A. et al.

Co-designing autonomous robotic agents involves simultaneously optimizing the controller and physical design of the agent. Its inherent bi-level optimization formulation necessitates an outer loop design optimization driven by an inner loop control optimization. This can be challenging when the design space is large and each design evaluation involves a data-intensive reinforcement learning process for control optimization. To improve the sample efficiency of co-design, we propose a multi-fidelity-based exploration strategy in which we tie the controllers learned across the design spaces through a universal policy learner for warm-starting subsequent controller learning problems. Experiments performed on a wide range of agent design problems demonstrate the superiority of our method compared to baselines. Additionally, analysis of the optimized designs shows interesting design alterations, including design simplifications and non-intuitive alterations.

AIJun 1, 2023
EMOTE: An Explainable architecture for Modelling the Other Through Empathy

Manisha Senadeera, Thommen Karimpanal George, Sunil Gupta et al.

We can usually assume others have goals analogous to our own. This assumption can also, at times, be applied to multi-agent games - e.g. Agent 1's attraction to green pellets is analogous to Agent 2's attraction to red pellets. This "analogy" assumption is tied closely to the cognitive process known as empathy. Inspired by empathy, we design a simple and explainable architecture to model another agent's action-value function. This involves learning an "Imagination Network" to transform the other agent's observed state in order to produce a human-interpretable "empathetic state" which, when presented to the learning agent, produces behaviours that mimic the other agent. Our approach is applicable to multi-agent scenarios consisting of a single learning agent and other (independent) agents acting according to fixed policies. This architecture is particularly beneficial for (but not limited to) algorithms using a composite value or reward function. We show our method produces better performance in multi-agent games, where it robustly estimates the other's model in different environment configurations. Additionally, we show that the empathetic states are human interpretable, and thus verifiable.

LGFeb 8, 2023
Zero-shot Sim2Real Adaptation Across Environments

Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana et al.

Simulation based learning often provides a cost-efficient recourse to reinforcement learning applications in robotics. However, simulators are generally incapable of accurately replicating real-world dynamics, and thus bridging the sim2real gap is an important problem in simulation based learning. Current solutions to bridge the sim2real gap involve hybrid simulators that are augmented with neural residual models. Unfortunately, they require a separate residual model for each individual environment configuration (i.e., a fixed setting of environment variables such as mass, friction etc.), and thus are not transferable to new environments quickly. To address this issue, we propose a Reverse Action Transformation (RAT) policy which learns to imitate simulated policies in the real-world. Once learnt from a single environment, RAT can then be deployed on top of a Universal Policy Network to achieve zero-shot adaptation to new environments. We empirically evaluate our approach in a set of continuous control tasks and observe its advantage as a few-shot and zero-shot learner over competing baselines.

LGDec 7, 2023Code
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection

Tuan Hoang, Santu Rana, Sunil Gupta et al.

Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so more efficiently than the naive retraining approach. To achieve this, we adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU), in which the model takes steps in the orthogonal direction to the gradient subspaces deemed unimportant for the retaining dataset, so as to its knowledge is preserved. By utilizing Stochastic Gradient Descent (SGD) to update the model weights, our method can efficiently scale to any model and dataset size. We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible. Our code is available at https://github.com/hnanhtuan/projected_gradient_unlearning.

CLMar 5, 2024Code
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al. · nvidia

In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs. We use an iterative rejection-sampling optimization process to find instruction-based prompts with two main characteristics: (1) minimal overlap with the training data to avoid presenting the solution directly to the model, and (2) maximal overlap between the victim model's output and the training data, aiming to induce the victim to spit out training data. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that (1) instruction-tuned models can expose pre-training data as much as their base-models, if not more so, (2) contexts other than the original training data can lead to leakage, and (3) using instructions proposed by other LLMs can open a new avenue of automated attacks that we should further study and explore. The code can be found at https://github.com/Alymostafa/Instruction_based_attack .

AISep 30, 2024
Dynamic Policy Fusion for User Alignment Without Re-Interaction

Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana

Deep reinforcement learning (RL) policies, although optimal in terms of task rewards, may not align with the personal preferences of human users. To ensure this alignment, a naive solution would be to retrain the agent using a reward function that encodes the user's specific preferences. However, such a reward function is typically not readily available, and as such, retraining the agent from scratch can be prohibitively expensive. We propose a more practical approach - to adapt the already trained policy to user-specific needs with the help of human feedback. To this end, we infer the user's intent through trajectory-level feedback and combine it with the trained task policy via a theoretically grounded dynamic policy fusion approach. As our approach collects human feedback on the very same trajectories used to learn the task policy, it does not require any additional interactions with the environment, making it a zero-shot approach. We empirically demonstrate in a number of environments that our proposed dynamic policy fusion approach consistently achieves the intended task while simultaneously adhering to user-specific needs.

LGApr 27
Leveraging Human Feedback for Semantically-Relevant Skill Discovery

Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

Unsupervised skill discovery in reinforcement learning aims to intrinsically motivate agents to discover diverse and useful behaviours. However, unconstrained approaches can produce unsafe, unethical, or misaligned behaviours. To mitigate these risks and improve the practical desireability of discovered skills, recent work grounds the discovery process by leveraging human preference feedback. However, preference-based approaches are feedback-inefficient and inherently ill-equipped to deal with skill spaces composed of a variety of different skills such as running, jumping, walking, etc. To overcome this limitation, we introduce semantic labelling, a novel and feedback-efficient approach that leverages human cognitive strengths to identify and label semantically meaningful behaviours. Based on semantic labelling, we propose Semantically Relevant Skill Discovery (SRSD), a novel human-in-the-loop approach that collects semantic labels from human feedback and learns a reward function to encourage skills to be more semantically diverse and relevant. Through our experiments in a 2D navigation environment and four locomotion environments, we demonstrate that SRSD can improve semantic diversity and discover relevant behaviours while scaling effectively to a large variety of behaviours.

LGFeb 27, 2024
Enhanced Bayesian Optimization via Preferential Modeling of Abstract Properties

Arun Kumar A, Alistair Shilton, Sunil Gupta et al.

Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise of its human (domain) experts who often reason about systems at different abstraction levels using physical properties that are not necessarily directly measured (or measurable). In this paper, we propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into the surrogate modeling to further boost the performance of BO. We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments. We discuss the convergence behavior of our proposed framework. Our experimental results involving synthetic functions and real-world datasets show the superiority of our method against the baselines.

CLMay 19, 2025
Improving Multilingual Language Models by Aligning Representations through Steering

Omar Mahmoud, Buddhika Laknath Semage, Thommen George Karimpanal et al.

This paper investigates how Large Language Models (LLMs) represent non-English tokens -- a question that remains underexplored despite recent progress. We propose a lightweight intervention method using representation steering, where a learned vector is added to the residual stream at a single model layer to enhance multilingual performance. Through extensive experiments across seven competitive baselines -- including prompt optimization, supervised fine-tuning (SFT), in-context learning, cross-lingual transfer, and translation-based methods-we show that our approach consistently outperforms most alternatives. In particular, it achieves performance on par with production-grade translation systems while requiring far fewer resources. We further explore the complementarity between our method and SFT, demonstrating that steering offers a direct, efficient way to realign internal representations. These findings underscore the potential of activation-level interventions as a powerful tool for improving the multilingual capabilities of LLMs.

AIApr 9
ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer

Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana

Reinforcement Learning (RL) agents often struggle to generalize knowledge to new tasks, even those structurally similar to ones they have mastered. Although recent approaches have attempted to mitigate this issue via zero-shot transfer, they are often constrained by predefined, discrete class systems, limiting their adaptability to novel or compositional task variations. We propose a significantly more generalized approach, replacing discrete latent variables with natural language conditioning via a text-conditioned Variational Autoencoder (VAE). Our core innovation utilizes a Large Language Model (LLM) as a dynamic \textit{semantic operator} at test time. Rather than relying on rigid rules, our agent queries the LLM to semantically remap the description of the current observation to align with the source task. This source-aligned caption conditions the VAE to generate an imagined state compatible with the agent's original training, enabling direct policy reuse. By harnessing the flexible reasoning capabilities of LLMs, our approach achieves zero-shot transfer across a broad spectrum of complex and truly novel analogous tasks, moving beyond the limitations of fixed category mappings. Code and videos are available \href{https://anonymous.4open.science/r/ASPECT-85C3/}{here}.

LGJan 29, 2025
Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment

Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

Unsupervised skill discovery in Reinforcement Learning aims to mimic humans' ability to autonomously discover diverse behaviors. However, existing methods are often unconstrained, making it difficult to find useful skills, especially in complex environments, where discovered skills are frequently unsafe or impractical. We address this issue by proposing Human-aligned Skill Discovery (HaSD), a framework that incorporates human feedback to discover safer, more aligned skills. HaSD simultaneously optimises skill diversity and alignment with human values. This approach ensures that alignment is maintained throughout the skill discovery process, eliminating the inefficiencies associated with exploring unaligned skills. We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments, showing that HaSD discovers diverse, human-aligned skills that are safe and useful for downstream tasks. Finally, we extend HaSD by learning a range of configurable skills with varying degrees of diversity alignment trade-offs that could be useful in practical scenarios.

LGNov 6, 2024
Efficient Symmetry-Aware Materials Generation via Hierarchical Generative Flow Networks

Tri Minh Nguyen, Sherif Abdulkader Tawfik, Truyen Tran et al.

Discovering new solid-state materials requires rapidly exploring the vast space of crystal structures and locating stable regions. Generating stable materials with desired properties and compositions is extremely difficult as we search for very small isolated pockets in the exponentially many possibilities, considering elements from the periodic table and their 3D arrangements in crystal lattices. Materials discovery necessitates both optimized solution structures and diversity in the generated material structures. Existing methods struggle to explore large material spaces and generate diverse samples with desired properties and requirements. We propose the Symmetry-aware Hierarchical Architecture for Flow-based Traversal (SHAFT), a novel generative model employing a hierarchical exploration strategy to efficiently exploit the symmetry of the materials space to generate crystal structures given desired properties. In particular, our model decomposes the exponentially large materials space into a hierarchy of subspaces consisting of symmetric space groups, lattice parameters, and atoms. We demonstrate that SHAFT significantly outperforms state-of-the-art iterative generative methods, such as Generative Flow Networks (GFlowNets) and Crystal Diffusion Variational AutoEncoders (CDVAE), in crystal structure generation tasks, achieving higher validity, diversity, and stability of generated structures optimized for target properties and requirements.

CLOct 9, 2025
The Unintended Trade-off of AI Alignment:Balancing Hallucination Mitigation and Safety in LLMs

Omar Mahmoud, Ali Khalil, Buddhika Laknath Semage et al.

Hallucination in large language models (LLMs) has been widely studied in recent years, with progress in both detection and mitigation aimed at improving truthfulness. Yet, a critical side effect remains largely overlooked: enhancing truthfulness can negatively impact safety alignment. In this paper, we investigate this trade-off and show that increasing factual accuracy often comes at the cost of weakened refusal behavior. Our analysis reveals that this arises from overlapping components in the model that simultaneously encode hallucination and refusal information, leading alignment methods to suppress factual knowledge unintentionally. We further examine how fine-tuning on benign datasets, even when curated for safety, can degrade alignment for the same reason. To address this, we propose a method that disentangles refusal-related features from hallucination features using sparse autoencoders, and preserves refusal behavior during fine-tuning through subspace orthogonalization. This approach prevents hallucinations from increasing while maintaining safety alignment.We evaluate our method on commonsense reasoning tasks and harmful benchmarks (AdvBench and StrongReject). Results demonstrate that our approach preserves refusal behavior and task utility, mitigating the trade-off between truthfulness and safety.

AIJun 2, 2025
MAGIK: Mapping to Analogous Goals via Imagination-enabled Knowledge Transfer

Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana

Humans excel at analogical reasoning - applying knowledge from one task to a related one with minimal relearning. In contrast, reinforcement learning (RL) agents typically require extensive retraining even when new tasks share structural similarities with previously learned ones. In this work, we propose MAGIK, a novel framework that enables RL agents to transfer knowledge to analogous tasks without interacting with the target environment. Our approach leverages an imagination mechanism to map entities in the target task to their analogues in the source domain, allowing the agent to reuse its original policy. Experiments on custom MiniGrid and MuJoCo tasks show that MAGIK achieves effective zero-shot transfer using only a small number of human-labelled examples. We compare our approach to related baselines and highlight how it offers a novel and effective mechanism for knowledge transfer via imagination-based analogy mapping.

CVJun 19, 2024
Composite Concept Extraction through Backdooring

Banibrata Ghosh, Haripriya Harikumar, Khoa D Doan et al.

Learning composite concepts, such as \textquotedbl red car\textquotedbl , from individual examples -- like a white car representing the concept of \textquotedbl car\textquotedbl{} and a red strawberry representing the concept of \textquotedbl red\textquotedbl -- is inherently challenging. This paper introduces a novel method called Composite Concept Extractor (CoCE), which leverages techniques from traditional backdoor attacks to learn these composite concepts in a zero-shot setting, requiring only examples of individual concepts. By repurposing the trigger-based model backdooring mechanism, we create a strategic distortion in the manifold of the target object (e.g., \textquotedbl car\textquotedbl ) induced by example objects with the target property (e.g., \textquotedbl red\textquotedbl ) from objects \textquotedbl red strawberry\textquotedbl , ensuring the distortion selectively affects the target objects with the target property. Contrastive learning is then employed to further refine this distortion, and a method is formulated for detecting objects that are influenced by the distortion. Extensive experiments with in-depth analysis across different datasets demonstrate the utility and applicability of our proposed approach.

MLMay 24, 2024
Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime

Alistair Shilton, Sunil Gupta, Santu Rana et al.

This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples.

LGFeb 5, 2024
Revisiting the Dataset Bias Problem from a Statistical Perspective

Kien Do, Dung Nguyen, Hung Le et al.

In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u|b) differing significantly from p(u). Since p(u|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a model trained on a biased dataset via MLL inherently incorporates such correlation into its parameters, leading to poor generalization to unbiased test data. From this observation, we propose to mitigate dataset bias via either weighting the objective of each sample n by \frac{1}{p(u_{n}|b_{n})} or sampling that sample with a weight proportional to \frac{1}{p(u_{n}|b_{n})}. While both methods are statistically equivalent, the former proves more stable and effective in practice. Additionally, we establish a connection between our debiasing approach and causal reasoning, reinforcing our method's theoretical foundation. However, when the bias label is unavailable, computing p(u|b) exactly is difficult. To overcome this challenge, we propose to approximate \frac{1}{p(u|b)} using a biased classifier trained with "bias amplification" losses. Extensive experiments on various biased datasets demonstrate the superiority of our method over existing debiasing techniques in most settings, validating our theoretical analysis.

LGMay 3, 2023
A Data-Driven Defense against Edge-case Model Poisoning Attacks on Federated Learning

Kiran Purohit, Soumi Das, Sourangshu Bhattacharya et al.

Federated Learning systems are increasingly subjected to a multitude of model poisoning attacks from clients. Among these, edge-case attacks that target a small fraction of the input space are nearly impossible to detect using existing defenses, leading to a high attack success rate. We propose an effective defense using an external defense dataset, which provides information about the attack target. The defense dataset contains a mix of poisoned and clean examples, with only a few known to be clean. The proposed method, DataDefense, uses this dataset to learn a poisoned data detector model which marks each example in the defense dataset as poisoned or clean. It also learns a client importance model that estimates the probability of a client update being malicious. The global model is then updated as a weighted average of the client models' updates. The poisoned data detector and the client importance model parameters are updated using an alternating minimization strategy over the Federated Learning rounds. Extensive experiments on standard attack scenarios demonstrate that DataDefense can defend against model poisoning attacks where other state-of-the-art defenses fail. In particular, DataDefense is able to reduce the attack success rate by at least ~ 40% on standard attack setups and by more than 80% on some setups. Furthermore, DataDefense requires very few defense examples (as few as five) to achieve a near-optimal reduction in attack success rate.

CRFeb 24, 2022
Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Kien Do, Haripriya Harikumar, Hung Le et al.

Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. In addition, we introduce a new defense mechanism called "Filtering-then-Contrasting" (FtC) which helps avoid the drop in classification accuracy on clean data caused by "filtering", and combine it with VIF/AIF to derive new defenses of this kind. Extensive experimental results and ablation studies show that our proposed defenses significantly outperform well-known baseline defenses in mitigating five advanced Trojan attacks including two recent state-of-the-art while being quite robust to small amounts of training data and large-norm triggers.

LGFeb 11, 2022
Uncertainty Aware System Identification with Universal Policies

Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana et al.

Sim2real transfer is primarily concerned with transferring policies trained in simulation to potentially noisy real world environments. A common problem associated with sim2real transfer is estimating the real-world environmental parameters to ground the simulated environment to. Although existing methods such as Domain Randomisation (DR) can produce robust policies by sampling from a distribution of parameters during training, there is no established method for identifying the parameters of the corresponding distribution for a given real-world setting. In this work, we propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies across the full range of environmental parameters and then subsequently employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion. Such policy-driven grounding is expected to be more efficient as it estimates only task-relevant sets of parameters. Further, we also account for the estimation uncertainties in the search process to produce policies that are robust against both aleatoric and epistemic uncertainties. We empirically evaluate our approach in a range of noisy, continuous control environments, and show its improved performance compared to competing baselines.

LGFeb 11, 2022
Fast Model-based Policy Search for Universal Policy Networks

Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana et al.

Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. Although recent approaches such as universal policy networks partially address this issue by enabling the storage of multiple policies trained in simulation on a wide range of dynamic/latent factors, efficiently identifying the most appropriate policy for a given environment remains a challenge. In this work, we propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment. We integrate this prior with a Bayesian Optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network. We empirically evaluate our approach in a range of continuous and discrete control environments, and show that it outperforms other competing baselines.

LGNov 3, 2021
Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets

Thommen George Karimpanal, Hung Le, Majid Abdolshah et al.

The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky scenarios. However, the existence of biases, whether overestimation or underestimation, need not necessarily be undesirable. In this paper, we analytically examine the utility of biased learning, and show that specific types of biases may be preferable, depending on the scenario. Based on this finding, we design a novel reinforcement learning algorithm, Balanced Q-learning, in which the target is modified to be a convex combination of a pessimistic and an optimistic term, whose associated weights are determined online, analytically. We prove the convergence of this algorithm in a tabular setting, and empirically demonstrate its superior learning performance in various environments.

CVOct 26, 2021
Semantic Host-free Trojan Attack

Haripriya Harikumar, Kien Do, Santu Rana et al.

In this paper, we propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space. In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class. Since in our attack, the backdoored classifier is encouraged to memorize the abstract semantics of the trigger images than any specific fixed pattern, it can be later triggered by semantically similar but different looking images. This makes our attack more practical to be applied in the real-world and harder to defend against. Extensive experimental results demonstrate that with only a small number of Trojan patterns for training, our attack can generalize well to new patterns of the same Trojan class and can bypass state-of-the-art defense methods.

LGAug 20, 2021
Plug and Play, Model-Based Reinforcement Learning

Majid Abdolshah, Hung Le, Thommen Karimpanal George et al.

Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a union of local transition functions, each with respect to one active object in the scene. Transition dynamics from an object class can be pre-learnt and thus would be ready to use in a new environment. Each active object is also endowed with its reward function. Since there is no central reward function, addition or removal of objects can be handled efficiently by only updating the reward functions of objects involved. A new transfer learning mechanism is also proposed to adapt reward function in such cases. Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.

MLJul 24, 2021
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

Hung Tran-The, Sunil Gupta, Thanh Nguyen-Tang et al.

We address policy learning with logged data in contextual bandits. Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring the logging policy to have \emph{full support} i.e. a non-zero probability for any context/action of the evaluation policy. However, many real-world systems do not guarantee such logging policies, especially when the action space is large and many actions have poor or missing rewards. With such \emph{support deficiency}, the offline learning fails to find optimal policies. We propose a novel approach that uses a hybrid of offline learning with online exploration. The online exploration is used to explore unsupported actions in the logged data whilst offline learning is used to exploit supported actions from the logged data avoiding unnecessary explorations. Our approach determines an optimal policy with theoretical guarantees using the minimal number of online explorations. We demonstrate our algorithms' effectiveness empirically on a diverse collection of datasets.

LGJul 18, 2021
A New Representation of Successor Features for Transfer across Dissimilar Environments

Majid Abdolshah, Hung Le, Thommen Karimpanal George et al.

Transfer in reinforcement learning is usually achieved through generalisation across tasks. Whilst many studies have investigated transferring knowledge when the reward function changes, they have assumed that the dynamics of the environments remain consistent. Many real-world RL problems require transfer among environments with different dynamics. To address this problem, we propose an approach based on successor features in which we model successor feature functions with Gaussian Processes permitting the source successor features to be treated as noisy measurements of the target successor feature function. Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions with Gaussian Processes in environments with both different dynamics and rewards. We demonstrate our method on benchmark datasets and show that it outperforms current baselines.

LGMay 10, 2021
Bayesian Optimistic Optimisation with Exponentially Decaying Regret

Hung Tran-The, Sunil Gupta, Santu Rana et al.

Bayesian optimisation (BO) is a well-known efficient algorithm for finding the global optimum of expensive, black-box functions. The current practical BO algorithms have regret bounds ranging from $\mathcal{O}(\frac{logN}{\sqrt{N}})$ to $\mathcal O(e^{-\sqrt{N}})$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation which are based on partitioning the search space. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $\mathcal O(N^{-\sqrt{N}})$ under the assumption that the objective function is sampled from a Gaussian process with a Matérn kernel with smoothness parameter $ν> 4 +\frac{D}{2}$, where $D$ is the number of dimensions. We perform experiments on optimisation of various synthetic functions and machine learning hyperparameter tuning tasks and show that our algorithm outperforms baselines.

LGApr 18, 2021
Intuitive Physics Guided Exploration for Sample Efficient Sim2real Transfer

Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana et al.

Physics-based reinforcement learning tasks can benefit from simplified physics simulators as they potentially allow near-optimal policies to be learned in simulation. However, such simulators require the latent factors (e.g. mass, friction coefficient etc.) of the associated objects and other environment-specific factors (e.g. wind speed, air density etc.) to be accurately specified, without which, it could take considerable additional learning effort to adapt the learned simulation policy to the real environment. As such a complete specification can be impractical, in this paper, we instead, focus on learning task-specific estimates of latent factors which allow the approximation of real world trajectories in an ideal simulation environment. Specifically, we propose two new concepts: a) action grouping - the idea that certain types of actions are closely associated with the estimation of certain latent factors, and; b) partial grounding - the idea that simulation of task-specific dynamics may not need precise estimation of all the latent factors. We first introduce intuitive action groupings based on human physics knowledge and experience, which is then used to design novel strategies for interacting with the real environment. Next, we describe how prior knowledge of a task in a given environment can be used to extract the relative importance of different latent factors, and how this can be used to inform partial grounding, which enables efficient learning of the task in any arbitrary environment. We demonstrate our approach in a range of physics based tasks, and show that it achieves superior performance relative to other baselines, using only a limited number of real-world interactions.

LGApr 11, 2021
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms

Huong Ha, Sunil Gupta, Santu Rana et al.

Machine learning models are being used extensively in many important areas, but there is no guarantee a model will always perform well or as its developers intended. Understanding the correctness of a model is crucial to prevent potential failures that may have significant detrimental impact in critical application areas. In this paper, we propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN). We develop a novel data augmentation method helping to train the BNN to achieve high accuracy. We also devise a theoretic information based sampling strategy to sample data points so as to achieve accurate estimations for the metrics of interest. Finally, we conduct an extensive set of experiments to test various machine learning models for different types of metrics. Our experiments show that the metrics estimations by our method are significantly better than existing baselines.

MLDec 17, 2020
High Dimensional Level Set Estimation with Bayesian Neural Network

Huong Ha, Sunil Gupta, Santu Rana et al.

Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) \textit{explicit} LSE problem where the threshold level is a fixed user-specified value, and, (2) \textit{implicit} LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse the theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.

CVNov 19, 2020
Logically Consistent Loss for Visual Question Answering

Anh-Cat Le-Ngo, Truyen Tran, Santu Rana et al.

Given an image, a back-ground knowledge, and a set of questions about an object, human learners answer the questions very consistently regardless of question forms and semantic tasks. The current advancement in neural-network based Visual Question Answering (VQA), despite their impressive performance, cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework as well as a data organisation called family-batch and hybrid-batch. To demonstrate usefulness of this proposal, we train and evaluate MAC-net based VQA machines with and without the proposed logically consistent loss and the proposed data organization. The experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance. Though the proposed approach is tested with MAC-net, it can be utilised in any other QA methods whenever the logical consistency between answers exist.

LGSep 20, 2020
Unsupervised Anomaly Detection on Temporal Multiway Data

Duc Nguyen, Phuoc Nguyen, Kien Do et al.

Temporal anomaly detection looks for irregularities over space-time. Unsupervised temporal models employed thus far typically work on sequences of feature vectors, and much less on temporal multiway data. We focus our investigation on two-way data, in which a data matrix is observed at each time step. Leveraging recent advances in matrix-native recurrent neural networks, we investigated strategies for data arrangement and unsupervised training for temporal multiway anomaly detection. These include compressing-decompressing, encoding-predicting, and temporal data differencing. We conducted a comprehensive suite of experiments to evaluate model behaviors under various settings on synthetic data, moving digits, and ECG recordings. We found interesting phenomena not previously reported. These include the capacity of the compact matrix LSTM to compress noisy data near perfectly, making the strategy of compressing-decompressing data ill-suited for anomaly detection under the noise. Also, long sequence of vectors can be addressed directly by matrix models that allow very long context and multiple step prediction. Overall, the encoding-predicting strategy works very well for the matrix LSTMs in the conducted experiments, thanks to its compactness and better fit to the data dynamics.

LGSep 8, 2020
Sequential Subspace Search for Functional Bayesian Optimization Incorporating Experimenter Intuition

Alistair Shilton, Sunil Gupta, Santu Rana et al.

We propose an algorithm for Bayesian functional optimisation - that is, finding the function to optimise a process - guided by experimenter beliefs and intuitions regarding the expected characteristics (length-scale, smoothness, cyclicity etc.) of the optimal solution encoded into the covariance function of a Gaussian Process. Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process. Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace. Using the concept of effective dimensionality, we analyse the convergence of our algorithm and provide a regret bound to show that our algorithm converges in sub-linear time provided a finite effective dimension exists. We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.

MLSep 5, 2020
Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces

Hung Tran-The, Sunil Gupta, Santu Rana et al.

Bayesian optimisation is a popular method for efficient optimisation of expensive black-box functions. Traditionally, BO assumes that the search space is known. However, in many problems, this assumption does not hold. To this end, we propose a novel BO algorithm which expands (and shifts) the search space over iterations based on controlling the expansion rate thought a hyperharmonic series. Further, we propose another variant of our algorithm that scales to high dimensions. We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates. Our experiments with synthetic and real-world optimisation tasks demonstrate the superiority of our algorithms over the current state-of-the-art methods for Bayesian optimisation in unknown search space.

MLJul 15, 2020
From deep to Shallow: Equivalent Forms of Deep Networks in Reproducing Kernel Krein Space and Indefinite Support Vector Machines

Alistair Shilton, Sunil Gupta, Santu Rana et al.

In this paper we explore a connection between deep networks and learning in reproducing kernel Krein space. Our approach is based on the concept of push-forward - that is, taking a fixed non-linear transform on a linear projection and converting it to a linear projection on the output of a fixed non-linear transform, pushing the weights forward through the non-linearity. Applying this repeatedly from the input to the output of a deep network, the weights can be progressively "pushed" to the output layer, resulting in a flat network that has the form of a fixed non-linear map (whose form is determined by the structure of the deep network) followed by a linear projection determined by the weight matrices - that is, we take a deep network and convert it to an equivalent (indefinite) kernel machine. We then investigate the implications of this transformation for capacity control and uniform convergence, and provide a Rademacher complexity bound on the deep network in terms of Rademacher complexity in reproducing kernel Krein space. Finally, we analyse the sparsity properties of the flat representation, showing that the flat weights are (effectively) Lp-"norm" regularised with 0<p<1 (bridge regression).

LGJun 19, 2020
Bayesian Optimization with Missing Inputs

Phuc Luong, Dang Nguyen, Sunil Gupta et al.

Bayesian optimization (BO) is an efficient method for optimizing expensive black-box functions. In real-world applications, BO often faces a major problem of missing values in inputs. The missing inputs can happen in two cases. First, the historical data for training BO often contain missing values. Second, when performing the function evaluation (e.g. computing alloy strength in a heat treatment process), errors may occur (e.g. a thermostat stops working) leading to an erroneous situation where the function is computed at a random unknown value instead of the suggested value. To deal with this problem, a common approach just simply skips data points where missing values happen. Clearly, this naive method cannot utilize data efficiently and often leads to poor performance. In this paper, we propose a novel BO method to handle missing inputs. We first find a probability distribution of each missing value so that we can impute the missing value by drawing a sample from its distribution. We then develop a new acquisition function based on the well-known Upper Confidence Bound (UCB) acquisition function, which considers the uncertainty of imputed values when suggesting the next point for function evaluation. We conduct comprehensive experiments on both synthetic and real-world applications to show the usefulness of our method.

CVJun 10, 2020
Scalable Backdoor Detection in Neural Networks

Haripriya Harikumar, Vuong Le, Santu Rana et al.

Recently, it has been shown that deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.

LGJun 8, 2020
Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation

Julian Berk, Sunil Gupta, Santu Rana et al.

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.

LGJun 2, 2020
DeepCoDA: personalized interpretability for compositional health data

Thomas P. Quinn, Dang Nguyen, Santu Rana et al.

Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution, and view it as a minimum requirement for a precision health model to justify its conclusions. Some health data, especially those generated by high-throughput sequencing experiments, have nuances that compromise precision health models and their interpretation. These data are compositional, meaning that each feature is conditionally dependent on all other features. We propose the Deep Compositional Data Analysis (DeepCoDA) framework to extend precision health modelling to high-dimensional compositional data, and to provide personalized interpretability through patient-specific weights. Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data.

MLMay 18, 2020
Variational Hyper-Encoding Networks

Phuoc Nguyen, Truyen Tran, Sunil Gupta et al.

We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters θis drawn from a distribution p(θ) which is modeled by a hyper-level VAE. We propose a variational inference using Gaussian mixture models to implicitly encode the parameters θinto a low dimensional Gaussian distribution. Given a target distribution, we predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution q(θ). HyperVAE can encode the parameters θin full in contrast to common hyper-networks practices, which generate only the scale and bias vectors as target-network parameters. Thus HyperVAE preserves much more information about the model for each task in the latent space. We discuss HyperVAE using the minimum description length (MDL) principle and show that it helps HyperVAE to generalize. We evaluate HyperVAE in density estimation tasks, outlier detection and discovery of novel design classes, demonstrating its efficacy.

LGMar 27, 2020
Incorporating Expert Prior in Bayesian Optimisation via Space Warping

Anil Ramachandran, Sunil Gupta, Santu Rana et al.

Bayesian optimisation is a well-known sample-efficient method for the optimisation of expensive black-box functions. However when dealing with big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function. Since the function evaluations are expensive in terms of both money and time, it may be desirable to alleviate this problem. One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation. In its standard form, Bayesian optimisation assumes the likelihood of any point in the search space being the optimum is equal. Therefore any prior knowledge that can provide information about the optimum of the function would elevate the optimisation performance. In this paper, we represent the prior knowledge about the function optimum through a prior distribution. The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum. We incorporate this prior directly in function model (Gaussian process), by redefining the kernel matrix, which allows this method to work with any acquisition function, i.e. acquisition agnostic approach. We show the superiority of our method over standard Bayesian optimisation method through optimisation of several benchmark functions and hyperparameter tuning of two algorithms: Support Vector Machine (SVM) and Random forest.