ROJul 7, 2023
Intelligent Robotic Sonographer: Mutual Information-based Disentangled Reward Learning from Few DemonstrationsZhongliang Jiang, Yuan Bi, Mingchuan Zhou et al.
Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously "explore" target anatomies and navigate a US probe to a relevant 2D plane by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparisons approach in a self-supervised fashion. This process can be referred to as understanding the "language of sonography". Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms ("line" target), two types of ex-vivo animal organs (chicken heart and lamb kidney) phantoms ("point" target) and in-vivo human carotids, respectively. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in-vivo human carotid data.
ROSep 12, 2022
Experimental Study on The Effect of Multi-step Deep Reinforcement Learning in POMDPsLingheng Meng, Rob Gorbet, Michael Burke et al.
Deep Reinforcement Learning (DRL) has made tremendous advances in both simulated and real-world robot control tasks in recent years. This is particularly the case for tasks that can be carefully engineered with a full state representation, and which can then be formulated as a Markov Decision Process (MDP). However, applying DRL strategies designed for MDPs to novel robot control tasks can be challenging, because the available observations may be a partial representation of the state, resulting in a Partially Observable Markov Decision Process (POMDP). This paper considers three popular DRL algorithms, namely Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC), invented for MDPs, and studies their performance in POMDP scenarios. While prior work has found that SAC and TD3 typically outperform PPO across a broad range of tasks that can be represented as MDPs, we show that this is not always the case, using three representative POMDP environments. Empirical studies show that this is related to multi-step bootstrapping, where multi-step immediate rewards, instead of one-step immediate reward, are used to calculate the target value estimation of an observation and action pair. We identify this by observing that the inclusion of multi-step bootstrapping in TD3 (MTD3) and SAC (MSAC) results in improved robustness in POMDP settings.
AIDec 31, 2025
Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational PreferencesEmmanuel Fashae, Michael Burke, Leimin Tian et al.
Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While effective at prediction, these models offer limited insight into the interpretable factors that guide human decisions. We introduce an explicit formulation of object arrangement preferences along four interpretable constructs: spatial practicality (putting items where they naturally fit best in the space), habitual convenience (making frequently used items easy to reach), semantic coherence (placing items together if they are used for the same task or are contextually related), and commonsense appropriateness (putting things where people would usually expect to find them). To capture these constructs, we designed and validated a self-report questionnaire through a 63-participant online study. Results confirm the psychological distinctiveness of these constructs and their explanatory power across two scenarios (kitchen and living room). We demonstrate the utility of these constructs by integrating them into a Monte Carlo Tree Search (MCTS) planner and show that when guided by participant-derived preferences, our planner can generate reasonable arrangements that closely align with those generated by participants. This work contributes a compact, interpretable formulation of object arrangement preferences and a demonstration of how it can be operationalized for robot planning.
LGOct 11, 2024Code
Carefully Structured Compression: Efficiently Managing StarCraft II DataBryce Ferenczi, Rhys Newbury, Michael Burke et al.
Creation and storage of datasets are often overlooked input costs in machine learning, as many datasets are simple image label pairs or plain text. However, datasets with more complex structures, such as those from the real time strategy game StarCraft II, require more deliberate thought and strategy to reduce cost of ownership. We introduce a serialization framework for StarCraft II that reduces the cost of dataset creation and storage, as well as improving usage ergonomics. We benchmark against the most comparable existing dataset from \textit{AlphaStar-Unplugged} and highlight the benefit of our framework in terms of both the cost of creation and storage. We use our dataset to train deep learning models that exceed the performance of comparable models trained on other datasets. The dataset conversion and usage framework introduced is open source and can be used as a framework for datasets with similar characteristics such as digital twin simulations. Pre-converted StarCraft II tournament data is also available online.
CRApr 22, 2015Code
Finding Tizen security bugs through whole-system static analysisDaniel Song, Jisheng Zhao, Michael Burke et al.
Tizen is a new Linux-based open source platform for consumer devices including smartphones, televisions, vehicles, and wearables. While Tizen provides kernel-level mandatory policy enforcement, it has a large collection of libraries, implemented in a mix of C and C++, which make their own security checks. In this research, we describe the design and engineering of a static analysis engine which drives a full information flow analysis for apps and a control flow analysis for the full library stack. We implemented these static analyses as extensions to LLVM, requiring us to improve LLVM's native analysis features to get greater precision and scalability, including knotty issues like the coexistence of C++ inheritance with C function pointer use. With our tools, we found several unexpected behaviors in the Tizen system, including paths through the system libraries that did not have inline security checks. We show how our tools can help the Tizen app store to verify important app properties as well as helping the Tizen development process avoid the accidental introduction of subtle vulnerabilities.
LGAug 3, 2025
Why Heuristic Weighting Works: A Theoretical Analysis of Denoising Score MatchingJuyan Zhang, Rhys Newbury, Xinyang Zhang et al.
Score matching enables the estimation of the gradient of a data distribution, a key component in denoising diffusion models used to recover clean data from corrupted inputs. In prior work, a heuristic weighting function has been used for the denoising score matching loss without formal justification. In this work, we demonstrate that heteroskedasticity is an inherent property of the denoising score matching objective. This insight leads to a principled derivation of optimal weighting functions for generalized, arbitrary-order denoising score matching losses, without requiring assumptions about the noise distribution. Among these, the first-order formulation is especially relevant to diffusion models. We show that the widely used heuristical weighting function arises as a first-order Taylor approximation to the trace of the expected optimal weighting. We further provide theoretical and empirical comparisons, revealing that the heuristical weighting, despite its simplicity, can achieve lower variance than the optimal weighting with respect to parameter gradients, which can facilitate more stable and efficient training.
AIJul 20, 2025
Feedback-Induced Performance Decline in LLM-Based Decision-MakingXiao Yang, Juxi Leitner, Michael Burke
The ability of Large Language Models (LLMs) to extract context from natural language problem descriptions naturally raises questions about their suitability in autonomous decision-making settings. This paper studies the behaviour of these models within a Markov Decision Process (MDPs). While traditional reinforcement learning (RL) strategies commonly employed in this setting rely on iterative exploration, LLMs, pre-trained on diverse datasets, offer the capability to leverage prior knowledge for faster adaptation. We investigate online structured prompting strategies in sequential decision making tasks, comparing the zero-shot performance of LLM-based approaches to that of classical RL methods. Our findings reveal that although LLMs demonstrate improved initial performance in simpler environments, they struggle with planning and reasoning in complex scenarios without fine-tuning or additional guidance. Our results show that feedback mechanisms, intended to improve decision-making, often introduce confusion, leading to diminished performance in intricate environments. These insights underscore the need for further exploration into hybrid strategies, fine-tuning, and advanced memory integration to enhance LLM-based decision-making capabilities.
LGJun 27, 2025
Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive ReviewAmara Zuffer, Michael Burke, Mehrtash Harandi
The diversity of tasks and dynamic nature of reinforcement learning (RL) require RL agents to be able to learn sequentially and continuously, a learning paradigm known as continuous reinforcement learning. This survey reviews how continual learning transforms RL agents into dynamic continual learners. This enables RL agents to acquire and retain useful and reusable knowledge seamlessly. The paper delves into fundamental aspects of continual reinforcement learning, exploring key concepts, significant challenges, and novel methodologies. Special emphasis is placed on recent advancements in continual reinforcement learning within robotics, along with a succinct overview of evaluation environments utilized in prominent research, facilitating accessibility for newcomers to the field. The review concludes with a discussion on limitations and promising future directions, providing valuable insights for researchers and practitioners alike.
CVNov 18, 2024
Learning a Neural Association Network for Self-supervised Multi-Object TrackingShuai Li, Michael Burke, Subramanian Ramamoorthy et al.
This paper introduces a novel framework to learn data association for multi-object tracking in a self-supervised manner. Fully-supervised learning methods are known to achieve excellent tracking performances, but acquiring identity-level annotations is tedious and time-consuming. Motivated by the fact that in real-world scenarios object motion can be usually represented by a Markov process, we present a novel expectation maximization (EM) algorithm that trains a neural network to associate detections for tracking, without requiring prior knowledge of their temporal correspondences. At the core of our method lies a neural Kalman filter, with an observation model conditioned on associations of detections parameterized by a neural network. Given a batch of frames as input, data associations between detections from adjacent frames are predicted by a neural network followed by a Sinkhorn normalization that determines the assignment probabilities of detections to states. Kalman smoothing is then used to obtain the marginal probability of observations given the inferred states, producing a training objective to maximize this marginal probability using gradient descent. The proposed framework is fully differentiable, allowing the underlying neural model to be trained end-to-end. We evaluate our approach on the challenging MOT17, MOT20, and BDD100K datasets and achieve state-of-the-art results in comparison to self-supervised trackers using public detections.
LGOct 11, 2024
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular ObservationsBryce Ferenczi, Michael Burke, Tom Drummond
Various works have aimed at combining the inference efficiency of recurrent models and training parallelism of multi-head attention for sequence modeling. However, most of these works focus on tasks with fixed-dimension observation spaces, such as individual tokens in language modeling or pixels in image completion. To handle an observation space of varying size, we propose a novel algorithm that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information. We find this resampling cycle is critical for performance. To evaluate efficient sequence modeling in this domain, we introduce two multi-agent intention tasks: simulated agents chasing bouncing particles and micromanagement analysis in professional StarCraft II games. Our algorithm achieves comparable accuracy with a lower parameter count, faster training and inference compared to existing methods.
CVSep 13, 2021
Vision-based system identification and 3D keypoint discovery using dynamics constraintsMiguel Jaques, Martin Asenov, Michael Burke et al.
This paper introduces V-SysId, a novel method that enables simultaneous keypoint discovery, 3D system identification, and extrinsic camera calibration from an unlabeled video taken from a static camera, using only the family of equations of motion of the object of interest as weak supervision. V-SysId takes keypoint trajectory proposals and alternates between maximum likelihood parameter estimation and extrinsic camera calibration, before applying a suitable selection criterion to identify the track of interest. This is then used to train a keypoint tracking model using supervised learning. Results on a range of settings (robotics, physics, physiology) highlight the utility of this approach.
CVMay 2, 2021
Learning data association without data association: An EM approach to neural assignment predictionMichael Burke, Subramanian Ramamoorthy
Data association is a fundamental component of effective multi-object tracking. Current approaches to data-association tend to frame this as an assignment problem relying on gating and distance-based cost matrices, or offset the challenge of data association to a problem of tracking by detection. The latter is typically formulated as a supervised learning problem, and requires labelling information about tracked object identities to train a model for object recognition. This paper introduces an expectation maximisation approach to train neural models for data association, which does not require labelling information. Here, a Sinkhorn network is trained to predict assignment matrices that maximise the marginal likelihood of trajectory observations. Importantly, networks trained using the proposed approach can be re-used in downstream tracking applications.
LGNov 30, 2020
IV-Posterior: Inverse Value Estimation for Interpretable Policy CertificatesTatiana Lopez-Guevara, Michael Burke, Nicholas K. Taylor et al.
Model-free reinforcement learning (RL) is a powerful tool to learn a broad range of robot skills and policies. However, a lack of policy interpretability can inhibit their successful deployment in downstream applications, particularly when differences in environmental conditions may result in unpredictable behaviour or generalisation failures. As a result, there has been a growing emphasis in machine learning around the inclusion of stronger inductive biases in models to improve generalisation. This paper proposes an alternative strategy, inverse value estimation for interpretable policy certificates (IV-Posterior), which seeks to identify the inductive biases or idealised conditions of operation already held by pre-trained policies, and then use this information to guide their deployment. IV-Posterior uses MaskedAutoregressive Flows to fit distributions over the set of conditions or environmental parameters in which a policy is likely to be effective. This distribution can then be used as a policy certificate in downstream applications. We illustrate the use of IV-Posterior across a two environments, and show that substantial performance gains can be obtained when policy selection incorporates knowledge of the inductive biases that these policies hold.
ROAug 18, 2020
Residual Learning from Demonstration: Adapting DMPs for Contact-rich ManipulationTodor Davchev, Kevin Sebastian Luck, Michael Burke et al.
Manipulation skills involving contact and friction are inherent to many robotics tasks. Using the class of motor primitives for peg-in-hole like insertions, we study how robots can learn such skills. Dynamic Movement Primitives (DMP) are a popular way of extracting such policies through behaviour cloning (BC) but can struggle in the context of insertion. Policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. However, it is not clear how to best do this with DMPs. As a result, we consider several possible ways for adapting a DMP formulation and propose ``residual Learning from Demonstration`` (rLfD), a framework that combines DMPs with Reinforcement Learning (RL) to learn a residual correction policy. Our evaluations suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs \rb{and enables transfer to different geometries and frictions through few-shot task adaptation}. The proposed framework is evaluated on a set of tasks. A simulated robot and a physical robot have to successfully insert pegs, gears and plugs into their respective sockets. Other material and videos accompanying this paper are provided at https://sites.google.com/view/rlfd/.
ROAug 3, 2020
Action sequencing using visual permutationsMichael Burke, Kartic Subr, Subramanian Ramamoorthy
Humans can easily reason about the sequence of high level actions needed to complete tasks, but it is particularly difficult to instil this ability in robots trained from relatively few examples. This work considers the task of neural action sequencing conditioned on a single reference visual state. This task is extremely challenging as it is not only subject to the significant combinatorial complexity that arises from large action sets, but also requires a model that can perform some form of symbol grounding, mapping high dimensional input data to actions, while reasoning about action relationships. This paper takes a permutation perspective and argues that action sequencing benefits from the ability to reason about both permutations and ordering concepts. Empirical analysis shows that neural models trained with latent permutations outperform standard neural architectures in constrained action sequencing tasks. Results also show that action sequencing using visual permutations is an effective mechanism to initialise and speed up traditional planning techniques and successfully scales to far greater action set sizes than models considered previously.
LGJun 2, 2020
NewtonianVAE: Proportional Control and Goal Identification from Pixels via Physical Latent SpacesMiguel Jaques, Michael Burke, Timothy Hospedales
Learning low-dimensional latent state space dynamics models has been a powerful paradigm for enabling vision-based planning and learning for control. We introduce a latent dynamics learning framework that is uniquely designed to induce proportional controlability in the latent space, thus enabling the use of much simpler controllers than prior work. We show that our learned dynamics model enables proportional control from pixels, dramatically simplifies and accelerates behavioural cloning of vision-based controllers, and provides interpretable goal discovery when applied to imitation learning of switching controllers from demonstration.
ROFeb 4, 2020
Learning rewards for robotic ultrasound scanning using probabilistic temporal rankingMichael Burke, Katie Lu, Daniel Angelov et al.
Informative path-planning is a well established approach to visual-servoing and active viewpoint selection in robotics, but typically assumes that a suitable cost function or goal state is known. This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from exploratory example demonstrations provided by a demonstrator, for use in a downstream informative path-planning policy. Unfortunately, many existing reward inference strategies are unsuited to this class of problems, due to the exploratory nature of the demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where these sub-optimal, exploratory demonstrations occur. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward, and use this hypothesis to generate time-based binary comparison outcomes and infer reward functions that support these ranks, under a probabilistic generative model. We formalise this \emph{probabilistic temporal ranking} approach and show that it improves upon existing approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging while also being of value across a broad range of goal-oriented learning from demonstration tasks. \keywords{Visual servoing \and reward inference \and probabilistic temporal ranking
LGJan 30, 2020
Black-Box Saliency Map Generation Using Bayesian OptimisationMamuku Mokuwe, Michael Burke, Anna Sergeevna Bosman
Saliency maps are often used in computer vision to provide intuitive interpretations of what input regions a model has used to produce a specific prediction. A number of approaches to saliency map generation are available, but most require access to model parameters. This work proposes an approach for saliency map generation for black-box models, where no access to model parameters is available, using a Bayesian optimisation sampling method. The approach aims to find the global salient image region responsible for a particular (black-box) model's prediction. This is achieved by a sampling-based approach to model perturbations that seeks to localise salient regions of an image to the black-box model. Results show that the proposed approach to saliency map generation outperforms grid-based perturbation approaches, and performs similarly to gradient-based approaches which require access to model parameters.
CVDec 10, 2019
Bias Remediation in Driver Drowsiness Detection systems using Generative Adversarial NetworksMkhuseli Ngxande, Jules-Raymond Tapamo, Michael Burke
Datasets are crucial when training a deep neural network. When datasets are unrepresentative, trained models are prone to bias because they are unable to generalise to real world settings. This is particularly problematic for models trained in specific cultural contexts, which may not represent a wide range of races, and thus fail to generalise. This is a particular challenge for Driver drowsiness detection, where many publicly available datasets are unrepresentative as they cover only certain ethnicity groups. Traditional augmentation methods are unable to improve a model's performance when tested on other groups with different facial attributes, and it is often challenging to build new, more representative datasets. In this paper, we introduce a novel framework that boosts the performance of detection of drowsiness for different ethnicity groups. Our framework improves Convolutional Neural Network (CNN) trained for prediction by using Generative Adversarial networks (GAN) for targeted data augmentation based on a population bias visualisation strategy that groups faces with similar facial attributes and highlights where the model is failing. A sampling method selects faces where the model is not performing well, which are used to fine-tune the CNN. Experiments show the efficacy of our approach in improving driver drowsiness detection for under represented ethnicity groups. Here, models trained on publicly available datasets are compared with a model trained using the proposed data augmentation strategy. Although developed in the context of driver drowsiness detection, the proposed framework is not limited to the driver drowsiness detection task, but can be applied to other applications.
LGNov 29, 2019
Learning Structured Representations of Spatial and Interactive Dynamics for Trajectory Prediction in Crowded ScenesTodor Davchev, Michael Burke, Subramanian Ramamoorthy
Context plays a significant role in the generation of motion for dynamic agents in interactive environments. This work proposes a modular method that utilises a learned model of the environment for motion prediction. This modularity explicitly allows for unsupervised adaptation of trajectory prediction models to unseen environments and new tasks by relying on unlabelled image data only. We model both the spatial and dynamic aspects of a given environment alongside the per agent motions. This results in more informed motion prediction and allows for performance comparable to the state-of-the-art. We highlight the model's prediction capability using a benchmark pedestrian prediction problem and a robot manipulation task and show that we can transfer the predictor across these tasks in a completely unsupervised way. The proposed approach allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments.
ROSep 16, 2019
Surfing on an uncertain edge: Precision cutting of soft tissue using torque-based medium classificationArtūras Straižys, Michael Burke, Subramanian Ramamoorthy
Precision cutting of soft-tissue remains a challenging problem in robotics, due to the complex and unpredictable mechanical behaviour of tissue under manipulation. Here, we consider the challenge of cutting along the boundary between two soft mediums, a problem that is made extremely difficult due to visibility constraints, which means that the precise location of the cutting trajectory is typically unknown. This paper introduces a novel strategy to address this task, using a binary medium classifier trained using joint torque measurements, and a closed loop control law that relies on an error signal compactly encoded in the decision boundary of the classifier. We illustrate this on a grapefruit cutting task, successfully modulating a nominal trajectory fit using dynamic movement primitives to follow the boundary between grapefruit pulp and peel using torque based medium classification. Results show that this control strategy is successful in 72 % of attempts in contrast to control using a nominal trajectory, which only succeeds in 50 % of attempts.
ROJul 31, 2019
Disentangled Relational Representations for Explaining and Learning from DemonstrationYordan Hristov, Daniel Angelov, Michael Burke et al.
Learning from demonstration is an effective method for human users to instruct desired robot behaviour. However, for most non-trivial tasks of practical interest, efficient learning from demonstration depends crucially on inductive bias in the chosen structure for rewards/costs and policies. We address the case where this inductive bias comes from an exchange with a human user. We propose a method in which a learning agent utilizes the information bottleneck layer of a high-parameter variational neural model, with auxiliary loss terms, in order to ground abstract concepts such as spatial relations. The concepts are referred to in natural language instructions and are manifested in the high-dimensional sensory input stream the agent receives from the world. We evaluate the properties of the latent space of the learned model in a photorealistic synthetic environment and particularly focus on examining its usability for downstream tasks. Additionally, through a series of controlled table-top manipulation experiments, we demonstrate that the learned manifold can be used to ground demonstrations as symbolic plans, which can then be executed on a PR2 robot.
ROJul 18, 2019
Composing Diverse Policies for Temporally Extended TasksDaniel Angelov, Yordan Hristov, Michael Burke et al.
Robot control policies for temporally extended and sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently.
ROJul 15, 2019
Vid2Param: Modelling of Dynamics Parameters from VideoMartin Asenov, Michael Burke, Daniel Angelov et al.
Videos provide a rich source of information, but it is generally hard to extract dynamical parameters of interest. Inferring those parameters from a video stream would be beneficial for physical reasoning. Robots performing tasks in dynamic environments would benefit greatly from understanding the underlying environment motion, in order to make future predictions and to synthesize effective control policies that use this inductive bias. Online physical reasoning is therefore a fundamental requirement for robust autonomous agents. When the dynamics involves multiple modes (due to contacts or interactions between objects) and sensing must proceed directly from a rich sensory stream such as video, then traditional methods for system identification may not be well suited. We propose an approach wherein fast parameter estimation can be achieved directly from video. We integrate a physically based dynamics model with a recurrent variational autoencoder, by introducing an additional loss to enforce desired constraints. The model, which we call Vid2Param, can be trained entirely in simulation, in an end-to-end manner with domain randomization, to perform online system identification, and make probabilistic forward predictions of parameters of interest. This enables the resulting model to encode parameters such as position, velocity, restitution, air drag and other physical properties of the system. We illustrate the utility of this in physical experiments wherein a PR2 robot with a velocity constrained arm must intercept an unknown bouncing ball with partly occluded vision, by estimating the physical parameters of this ball directly from the video trace after the ball is released.
ROJul 9, 2019
Hybrid system identification using switching density networksMichael Burke, Yordan Hristov, Subramanian Ramamoorthy
Behaviour cloning is a commonly used strategy for imitation learning and can be extremely effective in constrained domains. However, in cases where the dynamics of an environment may be state dependent and varying, behaviour cloning places a burden on model capacity and the number of demonstrations required. This paper introduces switching density networks, which rely on a categorical reparametrisation for hybrid system identification. This results in a network comprising a classification layer that is followed by a regression layer. We use switching density networks to predict the parameters of hybrid control laws, which are toggled by a switching layer to produce different controller outputs, when conditioned on an input state. This work shows how switching density networks can be used for hybrid system identification in a variety of tasks, successfully identifying the key joint angle goals that make up manipulation tasks, while simultaneously learning image-based goal classifiers and regression networks that predict joint angles from images. We also show that they can cluster the phase space of an inverted pendulum, identifying the balance, spin and pump controllers required to solve this task. Switching density networks can be difficult to train, but we introduce a cross entropy regularisation loss that stabilises training.
CVMay 27, 2019
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from VideoMiguel Jaques, Michael Burke, Timothy Hospedales
We propose a model that is able to perform unsupervised physical parameter estimation of systems from video, where the differential equations governing the scene dynamics are known, but labeled states or objects are not available. Existing physical scene understanding methods require either object state supervision, or do not integrate with differentiable physics to learn interpretable system parameters and states. We address this problem through a physics-as-inverse-graphics approach that brings together vision-as-inverse-graphics and differentiable physics engines, enabling objects and explicit state and velocity representations to be discovered. This framework allows us to perform long term extrapolative video prediction, as well as vision-based model-predictive control. Our approach significantly outperforms related unsupervised methods in long-term future frame prediction of systems with interacting objects (such as ball-spring or 3-body gravitational systems), due to its ability to build dynamics into the model as an inductive bias. We further show the value of this tight vision-physics integration by demonstrating data-efficient learning of vision-actuated model-based control for a pendulum system. We also show that the controller's interpretability provides unique capabilities in goal-driven control and physical reasoning for zero-data adaptation.
CVApr 23, 2019
Detecting inter-sectional accuracy differences in driver drowsiness detection algorithmsMkhuseli Ngxande, Jule-Raymond Tapamo, Michael Burke
Convolutional Neural Networks (CNNs) have been used successfully across a broad range of areas including data mining, object detection, and in business. The dominance of CNNs follows a breakthrough by Alex Krizhevsky which showed improvements by dramatically reducing the error rate obtained in a general image classification task from 26.2% to 15.4%. In road safety, CNNs have been applied widely to the detection of traffic signs, obstacle detection, and lane departure checking. In addition, CNNs have been used in data mining systems that monitor driving patterns and recommend rest breaks when appropriate. This paper presents a driver drowsiness detection system and shows that there are potential social challenges regarding the application of these techniques, by highlighting problems in detecting dark-skinned driver's faces. This is a particularly important challenge in African contexts, where there are more dark-skinned drivers. Unfortunately, publicly available datasets are often captured in different cultural contexts, and therefore do not cover all ethnicities, which can lead to false detections or racially biased models. This work evaluates the performance obtained when training convolutional neural network models on commonly used driver drowsiness detection datasets and testing on datasets specifically chosen for broader representation. Results show that models trained using publicly available datasets suffer extensively from over-fitting, and can exhibit racial bias, as shown by testing on a more representative dataset. We propose a novel visualisation technique that can assist in identifying groups of people where there might be the potential of discrimination, using Principal Component Analysis (PCA) to produce a grid of faces sorted by similarity, and combining these with a model accuracy overlay.
CVMar 6, 2019
DepthwiseGANs: Fast Training Generative Adversarial Networks for Realistic Image SynthesisMkhuseli Ngxande, Jules-Raymond Tapamo, Michael Burke
Recent work has shown significant progress in the direction of synthetic data generation using Generative Adversarial Networks (GANs). GANs have been applied in many fields of computer vision including text-to-image conversion, domain transfer, super-resolution, and image-to-video applications. In computer vision, traditional GANs are based on deep convolutional neural networks. However, deep convolutional neural networks can require extensive computational resources because they are based on multiple operations performed by convolutional layers, which can consist of millions of trainable parameters. Training a GAN model can be difficult and it takes a significant amount of time to reach an equilibrium point. In this paper, we investigate the use of depthwise separable convolutions to reduce training time while maintaining data generation performance. Our results show that a DepthwiseGAN architecture can generate realistic images in shorter training periods when compared to a StarGan architecture, but that model capacity still plays a significant role in generative modelling. In addition, we show that depthwise separable convolutions perform best when only applied to the generator. For quality evaluation of generated images, we use the Fréchet Inception Distance (FID), which compares the similarity between the generated image distribution and that of the training dataset.
ROFeb 27, 2019
From explanation to synthesis: Compositional program induction for learning from demonstrationMichael Burke, Svetlin Penkov, Subramanian Ramamoorthy
Hybrid systems are a compact and natural mechanism with which to address problems in robotics. This work introduces an approach to learning hybrid systems from demonstrations, with an emphasis on extracting models that are explicitly verifiable and easily interpreted by robot operators. We fit a sequence of controllers using sequential importance sampling under a generative switching proportional controller task model. Here, we parameterise controllers using a proportional gain and a visually verifiable joint angle goal. Inference under this model is challenging, but we address this by introducing an attribution prior extracted from a neural end-to-end visuomotor control model. Given the sequence of controllers comprising a task, we simplify the trace using grammar parsing strategies, taking advantage of the sequence compositionality, before grounding the controllers by training perception networks to predict goals given images. Using this approach, we are successfully able to induce a program for a visuomotor reaching task involving loops and conditionals from a single demonstration and a neural end-to-end model. In addition, we are able to discover the program used for a tower building task. We argue that computer program-like control systems are more interpretable than alternative end-to-end learning approaches, and that hybrid systems inherently allow for better generalisation across task configurations.
CVJun 19, 2017
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image ComparisonsMichael Burke, Siyabonga Mbonambi, Purity Molala et al.
A great deal of work aims to discover large general purpose models of image interest or memorability for visual search and information retrieval. This paper argues that image interest is often domain and user specific, and that efficient mechanisms for learning about this domain-specific image interest as quickly as possible, while limiting the amount of data-labelling required, are often more useful to end-users. This work uses pairwise image comparisons to reduce the labelling burden on these users, and introduces an image interest estimation approach that performs similarly to recent data hungry deep learning approaches trained using pairwise ranking losses. Here, we use a Gaussian process model to interpolate image interest inferred using a Bayesian ranking approach over image features extracted using a pre-trained convolutional neural network. Results show that fitting a Gaussian process in high-dimensional image feature space is not only computationally feasible, but also effective across a broad range of domains. The proposed probabilistic interest estimation approach produces image interests paired with uncertainties that can be used to identify images for which additional labelling is required and measure inference convergence, allowing for sample efficient active model training. Importantly, the probabilistic formulation allows for effective visual search and information retrieval when limited labelling data is available.
CVMay 20, 2014
Single camera pose estimation using Bayesian filtering and Kinect motion priorsMichael Burke, Joan Lasenby
Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.