67.5SDJun 2Code
The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing AidsAlejandro Ballesta Rosen, Jason Mikiel-Hunter, Julian Maclaren et al.
Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP) outputs and stabilized auditory images (SAIs), the latter providing a 2D representation that captures phase-insensitive temporal structure in the auditory nerve output. Through gradient descent, the SEANet model learns to both denoise the input and compensate for the hearing loss modelled by the impaired CARFAC model. Across neural-representation and signal-fidelity metrics, the DAL-optimized SEANet model outperformed the tested master hearing aid (MHA) baselines. The DAL framework provides a practical path toward model-based, machine-learning-driven personalization of hearing aid signal processing. Next steps include hardware deployment to enable real-world clinical testing.
ROMar 6, 2023
Leveraging Scene Embeddings for Gradient-Based Motion Planning in Latent SpaceJun Yamada, Chia-Man Hung, Jack Collins et al. · oxford
Motion planning framed as optimisation in structured latent spaces has recently emerged as competitive with traditional methods in terms of planning success while significantly outperforming them in terms of computational speed. However, the real-world applicability of recent work in this domain remains limited by the need to express obstacle information directly in state-space, involving simple geometric primitives. In this work we address this challenge by leveraging learned scene embeddings together with a generative model of the robot manipulator to drive the optimisation process. In addition, we introduce an approach for efficient collision checking which directly regularises the optimisation undertaken for planning. Using simulated as well as real-world experiments, we demonstrate that our approach, AMP-LS, is able to successfully plan in novel, complex scenes while outperforming traditional planning baselines in terms of computation speed by an order of magnitude. We show that the resulting system is fast enough to enable closed-loop planning in real-world dynamic scenes.
ROMar 6, 2023
Efficient Skill Acquisition for Complex Manipulation Tasks in Obstructed EnvironmentsJun Yamada, Jack Collins, Ingmar Posner
Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Specifically, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.
RONov 7, 2023
TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real TransferJun Yamada, Marc Rigter, Jack Collins et al.
Model-based RL is a promising approach for real-world robotics due to its improved sample efficiency and generalization capabilities compared to model-free RL. However, effective model-based RL solutions for vision-based real-world applications require bridging the sim-to-real gap for any world model learnt. Due to its significant computational cost, standard domain randomisation does not provide an effective solution to this problem. This paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer) to achieve efficient sim-to-real transfer of vision-based model-based RL using distillation. Specifically, TWIST leverages state observations as readily accessible, privileged information commonly garnered from a simulator to significantly accelerate sim-to-real transfer. Specifically, a teacher world model is trained efficiently on state information. At the same time, a matching dataset is collected of domain-randomised image observations. The teacher world model then supervises a student world model that takes the domain-randomised image observations as input. By distilling the learned latent dynamics model from the teacher to the student model, TWIST achieves efficient and effective sim-to-real transfer for vision-based model-based RL tasks. Experiments in simulated and real robotics tasks demonstrate that our approach outperforms naive domain randomisation and model-free methods in terms of sample efficiency and task performance of sim-to-real transfer.
SDMay 29, 2025Code
Nosey: Open-source hardware for acoustic nasalanceMaya Dewhurst, Jack Collins, Justin J. H. Lo et al.
We introduce Nosey (Nasalance Open Source Estimation sYstem), a low-cost, customizable, 3D-printed system for recording acoustic nasalance data that we have made available as open-source hardware (http://github.com/phoneticslab/nosey). We first outline the motivations and design principles behind our hardware nasalance system, and then present a comparison between Nosey and a commercial nasalance device. Nosey shows consistently higher nasalance scores than the commercial device, but the magnitude of contrast between phonological environments is comparable between systems. We also review ways of customizing the hardware to facilitate testing, such as comparison of microphones and different construction materials. We conclude that Nosey is a flexible and cost-effective alternative to commercial nasometry devices and propose some methodological considerations for its use in data collection.
CLMay 22, 2023Code
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World PharmacovigilanceKarel D'Oosterlinck, François Remy, Johannes Deleu et al.
Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical output of drug safety reporting in the U.S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts. The core features of these reports include the reported weight, age, and biological sex of a patient, a set of drugs taken by the patient, the drug dosages, the reactions experienced, and whether the reaction was life threatening. In this work, we consider the task of predicting the core information of the report given its originating paper. We estimate human performance to be 72.0% F1, whereas our best model achieves 62.3% F1, indicating significant headroom on this task. We also begin to explore ways in which these models could help professional PV reviewers. Our code and data are available: https://github.com/KarelDO/BioDEX.
CVOct 12, 2024
Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR ProceedingsMojtaba Yousefi, Jack Collins
This study examines the alignment of \emph{Conference on Computer Vision and Pattern Recognition} (CVPR) research with the principles of the "bitter lesson" proposed by Rich Sutton. We analyze two decades of CVPR abstracts and titles using large language models (LLMs) to assess the field's embracement of these principles. Our methodology leverages state-of-the-art natural language processing techniques to systematically evaluate the evolution of research approaches in computer vision. The results reveal significant trends in the adoption of general-purpose learning algorithms and the utilization of increased computational resources. We discuss the implications of these findings for the future direction of computer vision research and its potential impact on broader artificial intelligence development. This work contributes to the ongoing dialogue about the most effective strategies for advancing machine learning and computer vision, offering insights that may guide future research priorities and methodologies in the field.
ROFeb 12, 2025
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded GraspingJun Yamada, Alexander L. Mitchell, Jack Collins et al.
This paper addresses the challenge of occluded robot grasping, i.e. grasping in situations where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions. Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans in these circumstances. State-of-the-art reinforcement learning (RL) methods are unsuitable due to the inherent complexity of the task. In contrast, learning from demonstration requires collecting a significant number of expert demonstrations, which is often infeasible. Instead, inspired by human bimanual manipulation strategies, where two hands coordinate to stabilise and reorient objects, we focus on a bimanual robotic setup to tackle this challenge. In particular, we introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies: a constraint policy trained using self-supervised datasets to generate stabilising poses and a grasping policy trained using RL that reorients and grasps the target object. A key contribution lies in value function-guided policy coordination. Specifically, during RL training for the grasping policy, the constraint policy's output is refined through gradients from a jointly trained value function, improving bimanual coordination and task performance. Lastly, COMBO-Grasp employs teacher-student policy distillation to effectively deploy point cloud-based policies in real-world environments. Empirical evaluations demonstrate that COMBO-Grasp significantly improves task success rates compared to competitive baseline approaches, with successful generalisation to unseen objects in both simulated and real-world environments.
ROMar 19, 2024
D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable ManipulationJun Yamada, Shaohong Zhong, Jack Collins et al.
Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.
ROSep 10, 2021
Follow the Gradient: Crossing the Reality Gap using Differentiable Physics (RealityGrad)Jack Collins, Ross Brown, Jürgen Leitner et al.
We propose a novel iterative approach for crossing the reality gap that utilises live robot rollouts and differentiable physics. Our method, RealityGrad, demonstrates for the first time, an efficient sim2real transfer in combination with a real2sim model optimisation for closing the reality gap. Differentiable physics has become an alluring alternative to classical rigid-body simulation due to the current culmination of automatic differentiation libraries, compute and non-linear optimisation libraries. Our method builds on this progress and employs differentiable physics for efficient trajectory optimisation. We demonstrate RealitGrad on a dynamic control task for a serial link robot manipulator and present results that show its efficiency and ability to quickly improve not just the robot's performance in real world tasks but also enhance the simulation model for future tasks. One iteration of RealityGrad takes less than 22 minutes on a desktop computer while reducing the error by 2/3, making it efficient compared to other sim2real methods in both compute and time. Our methodology and application of differentiable physics establishes a promising approach for crossing the reality gap and has great potential for scaling to complex environments.
ROMar 3, 2020
Traversing the Reality Gap via Simulator TuningJack Collins, Ross Brown, Jurgen Leitner et al.
The large demand for simulated data has made the reality gap a problem on the forefront of robotics. We propose a method to traverse the gap by tuning available simulation parameters. Through the optimisation of physics engine parameters, we show that we are able to narrow the gap between simulated solutions and a real world dataset, and thus allow more ready transfer of leaned behaviours between the two. We subsequently gain understanding as to the importance of specific simulator parameters, which is of broad interest to the robotic machine learning community. We find that even optimised for different tasks that different physics engine perform better in certain scenarios and that friction and maximum actuator velocity are tightly bounded parameters that greatly impact the transference of simulated solutions.
RONov 5, 2019
Benchmarking Simulated Robotic Manipulation through a Real World DatasetJack Collins, Jessie McVicar, David Wedlock et al.
We present a benchmark to facilitate simulated manipulation; an attempt to overcome the obstacles of physical benchmarks through the distribution of a real world, ground truth dataset. Users are given various simulated manipulation tasks with assigned protocols having the objective of replicating the real world results of a recorded dataset. The benchmark comprises of a range of metrics used to characterise the successes of submitted environments whilst providing insight into their deficiencies. We apply our benchmark to two simulation environments, PyBullet and V-Rep, and publish the results. All materials required to benchmark an environment, including protocols and the dataset, can be found at the benchmarks' website https://research.csiro.au/robotics/manipulation-benchmark/.
ROJan 21, 2019
Comparing Direct and Indirect Representations for Environment-Specific Robot Component DesignJack Collins, Ben Cottier, David Howard
We compare two representations used to define the morphology of legs for a hexapod robot, which are subsequently 3D printed. A leg morphology occupies a set of voxels in a voxel grid. One method, a direct representation, uses a collection of Bezier splines. The second, an indirect method, utilises CPPN-NEAT. In our first experiment, we investigate two strategies to post-process the CPPN output and ensure leg length constraints are met. The first uses an adaptive threshold on the output neuron, the second, previously reported in the literature, scales the largest generated artefact to our desired length. In our second experiment, we build on our past work that evolves the tibia of a hexapod to provide environment-specific performance benefits. We compare the performance of our direct and indirect legs across three distinct environments, represented in a high-fidelity simulator. Results are significant and support our hypothesis that the indirect representation allows for further exploration of the design space leading to improved fitness.
RONov 5, 2018
Quantifying the Reality Gap in Robotic Manipulation TasksJack Collins, David Howard, Jürgen Leitner
We quantify the accuracy of various simulators compared to a real world robotic reaching and interaction task. Simulators are used in robotics to design solutions for real world hardware without the need for physical access. The `reality gap' prevents solutions developed or learnt in simulation from performing well, or at at all, when transferred to real-world hardware. Making use of a Kinova robotic manipulator and a motion capture system, we record a ground truth enabling comparisons with various simulators, and present quantitative data for various manipulation-oriented robotic tasks. We show the relative strengths and weaknesses of numerous contemporary simulators, highlighting areas of significant discrepancy, and assisting researchers in the field in their selection of appropriate simulators for their use cases. All code and parameter listings are publicly available from: https://bitbucket.csiro.au/scm/~col549/quantifying-the-reality-gap-in-robotic-manipulation-tasks.git .
ROOct 11, 2018
Towards the Targeted Environment-Specific Evolution of Robot ComponentsJack Collins, Wade Geles, David Howard et al.
This research considers the task of evolving the physical structure of a robot to enhance its performance in various environments, which is a significant problem in the field of Evolutionary Robotics. Inspired by the fields of evolutionary art and sculpture, we evolve only targeted parts of a robot, which simplifies the optimisation problem compared to traditional approaches that must simultaneously evolve both (actuated) body and brain. Exploration fidelity is emphasised in areas of the robot most likely to benefit from shape optimisation, whilst exploiting existing robot structure and control. Our approach uses a Genetic Algorithm to optimise collections of Bezier splines that together define the shape of a legged robot's tibia, and leg performance is evaluated in parallel in a high-fidelity simulator. The leg is represented in the simulator as 3D-printable file, and as such can be readily instantiated in reality. Provisional experiments in three distinct environments show the evolution of environment-specific leg structures that are both high-performing and notably different to those evolved in the other environments. This proof-of-concept represents an important step towards the environment-dependent optimisation of performance-critical components for a range of ubiquitous, standard, and already-capable robots that can carry out a wide variety of tasks.