CLNov 4, 2021
Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic ChannelKevin Eloff, Okko Räsänen, Herman A. Engelbrecht et al.
Multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, yet little focus has been given to continuous acoustic communication. This would be more akin to human language acquisition; human infants acquire language in large part through continuous signalling with their caregivers. We therefore ask: Are we able to observe emergent language between agents with a continuous communication channel? Our goal is to provide a platform to begin bridging the gap between human and agent communication, allowing us to analyse continuous signals, how they emerge, their characteristics, and how they relate to human language acquisition. We propose a messaging environment where a Speaker agent needs to convey a set of attributes to a Listener over a noisy acoustic channel. Using DQN to train our agents, we show that: (1) unlike the discrete case, the acoustic Speaker learns redundancy to improve Listener coherency, (2) the acoustic Speaker develops more compositional communication protocols which implicitly compensates for transmission errors over a noisy channel, and (3) DQN has significant performance gains and increased compositionality when compared to previous methods optimised using REINFORCE.
LGJul 20, 2021
Toward Collaborative Reinforcement Learning Agents that Communicate Through Text-Based Natural LanguageKevin Eloff, Herman A. Engelbrecht
Communication between agents in collaborative multi-agent settings is in general implicit or a direct data stream. This paper considers text-based natural language as a novel form of communication between multiple agents trained with reinforcement learning. This could be considered first steps toward a truly autonomous communication without the need to define a limited set of instructions, and natural collaboration between humans and robots. Inspired by the game of Blind Leads, we propose an environment where one agent uses natural language instructions to guide another through a maze. We test the ability of reinforcement learning agents to effectively communicate through discrete word-level symbols and show that the agents are able to sufficiently communicate through natural language with a limited vocabulary. Although the communication is not always perfect English, the agents are still able to navigate the maze. We achieve a BLEU score of 0.85, which is an improvement of 0.61 over randomly generated sequences while maintaining a 100% maze completion rate. This is a 3.5 times the performance of the random baseline using our reference set.
ROMar 18, 2021
Reward Signal Design for Autonomous RacingBenjamin Evans, Herman A. Engelbrecht, Hendrik W. Jordaan
Reinforcement learning (RL) has shown to be a valuable tool in training neural networks for autonomous motion planning. The application of RL to a specific problem is dependent on a reward signal to quantify how good or bad a certain action is. This paper addresses the problem of reward signal design for robotic control in the context of local planning for autonomous racing. We aim to design reward signals that are able to perform well in multiple, competing, continuous metrics. Three different methodologies of position-based, velocity-based, and action-based rewards are considered and evaluated in the context of F1/10th racing. A novel method of rewarding the agent on its state relative to an optimal trajectory is presented. Agents are trained and tested in simulation and the behaviors generated by the reward signals are compared to each other on the basis of average lap time and completion rate. The results indicate that a reward based on the distance and velocity relative to a minimum curvature trajectory produces the fastest lap times.
ROFeb 22, 2021
Learning the Subsystem of Local Planning for Autonomous RacingBenjamin Evans, Hendrik W. Jordaan, Herman A. Engelbrecht
The problem of autonomous racing is to navigate through a race course as quickly as possible while not colliding with any obstacles. We approach the autonomous racing problem with the added constraint of not maintaining an updated obstacle map of the environment. Several current approaches to this problem use end-to-end learning systems where an agent replaces the entire navigation pipeline. This paper presents a hierarchical planning architecture that combines a high level planner and path following system with a reinforcement learning agent that learns that subsystem of obstacle avoidance. The novel "modification planner" uses the path follower to track the global plan and the deep reinforcement learning agent to modify the references generated by the path follower to avoid obstacles. Importantly, our architecture does not require an updated obstacle map and only 10 laser range finders to avoid obstacles. The modification planner is evaluated in the context of F1/10th autonomous racing and compared to a end-to-end learning baseline, the Follow the Gap Method and an optimisation based planner. The results show that the modification planner can achieve faster average times compared to the baseline end-to-end planner and a 94% success rate which is similar to the baseline.
CLMar 28, 2020
Unsupervised feature learning for speech using correspondence and Siamese networksPetri-Johan Last, Herman A. Engelbrecht, Herman Kamper
In zero-resource settings where transcribed speech audio is unavailable, unsupervised feature learning is essential for downstream speech processing tasks. Here we compare two recent methods for frame-level acoustic feature learning. For both methods, unsupervised term discovery is used to find pairs of word examples of the same unknown type. Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models. For the correspondence autoencoder (CAE), matching frames are presented as input-output pairs. The Triamese network uses a contrastive loss to reduce the distance between frames of the same predicted word type while increasing the distance between negative examples. For the first time, these feature extractors are compared on the same discrimination tasks using the same weak supervision pairs. We find that, on the two datasets considered here, the CAE outperforms the Triamese network. However, we show that a new hybrid correspondence-Triamese approach (CTriamese), consistently outperforms both the CAE and Triamese models in terms of average precision and ABX error rates on both English and Xitsonga evaluation data.
IVDec 11, 2019
Deep motion estimation for parallel inter-frame prediction in video compressionAndré Nortje, Herman A. Engelbrecht, Herman Kamper
Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames. We propose to learn binary motion codes that are encoded based on an input video sequence. These codes are not limited to 2D translations, but can capture complex motion (warping, rotation and occlusion). Our motion codes are learned as part of a single neural network which also learns to compress and decode them. This approach supports parallel video frame decoding instead of the sequential motion estimation and compensation of flow-based methods. We also introduce 3D dynamic bit assignment to adapt to object displacements caused by motion, yielding additional bit savings. By replacing the optical flow-based block-motion algorithms found in an existing video codec with our learned inter-frame prediction model, our approach outperforms the standard H.264 and H.265 video codecs across at low bitrates.
IVDec 11, 2019
BINet: a binary inpainting network for deep patch-based image compressionAndré Nortje, Willie Brink, Herman A. Engelbrecht et al.
Recent deep learning models outperform standard lossy image compression codecs. However, applying these models on a patch-by-patch basis requires that each image patch be encoded and decoded independently. The influence from adjacent patches is therefore lost, leading to block artefacts at low bitrates. We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images. When decoding a patch, BINet additionally uses the binarised encodings from surrounding patches to guide its reconstruction. In contrast to sequential inpainting methods where patches are decoded based on previons reconstructions, BINet operates directly on the binary codes of surrounding patches without access to the original or reconstructed image data. Encoding and decoding can therefore be performed in parallel. We demonstrate that BINet improves the compression quality of a competitive deep image codec across a range of compression levels.
CLNov 9, 2018
Multimodal One-Shot Learning of Speech and ImagesRyan Eloff, Herman A. Engelbrecht, Herman Kamper
Imagine a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.