CVJul 20, 2023Code
Learned Thresholds Token Merging and Pruning for Vision TransformersMaxim Bonnaerens, Joni Dambre
Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .
CVJun 30, 2023
Towards the extraction of robust sign embeddings for low resource sign language recognitionMathieu De Coster, Ellen Rushe, Ruth Holmes et al.
Isolated Sign Language Recognition (SLR) has mostly been applied on datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process the sign language videos. One could expect human pose estimators to be ideal candidates. However, due to a domain mismatch with their training sets and challenging poses in sign language, they lack robustness on sign language data and image-based models often still outperform keypoint-based models. Furthermore, whereas the common practice of transfer learning with image-based models yields even higher accuracy, keypoint-based models are typically trained from scratch on every SLR dataset. These factors limit their usefulness for SLR. From the existing literature, it is also not clear which, if any, pose estimator performs best for SLR. We compare the three most popular pose estimators for SLR: OpenPose, MMPose and MediaPipe. We show that through keypoint normalization, missing keypoint imputation, and learning a pose embedding, we can obtain significantly better results and enable transfer learning. We show that keypoint-based embeddings contain cross-lingual features: they can transfer between sign languages and achieve competitive performance even when fine-tuning only the classifier layer of an SLR model on a target sign language. We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language. The embeddings can also be learned in a multilingual fashion. The application of these embeddings could prove particularly useful for low resource sign languages in the future.
CVAug 26, 2022
Hardware-aware mobile building block evaluation for computer visionMaxim Bonnaerens, Matthias Freiberger, Marian Verhelst et al.
In this work we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture the underlying accuracy/complexity trade-offs. We show that our approach allows to match the information obtained by previous comparison paradigms, but provides more insights in the relationship between hardware cost and accuracy. We use our methodology to analyze different building blocks and evaluate their performance on a range of embedded hardware platforms. This highlights the importance of benchmarking building blocks as a preselection step in the design process of a neural network. We show that choosing the right building block can speed up inference by up to a factor of 2x on specific hardware ML accelerators.
CVApr 1, 2021Code
Anchor Pruning for Object DetectionMaxim Bonnaerens, Matthias Freiberger, Joni Dambre
This paper proposes anchor pruning for object detection in one-stage anchor-based detectors. While pruning techniques are widely used to reduce the computational cost of convolutional neural networks, they tend to focus on optimizing the backbone networks where often most computations are. In this work we demonstrate an additional pruning technique, specifically for object detection: anchor pruning. With more efficient backbone networks and a growing trend of deploying object detectors on embedded systems where post-processing steps such as non-maximum suppression can be a bottleneck, the impact of the anchors used in the detection head is becoming increasingly more important. In this work, we show that many anchors in the object detection head can be removed without any loss in accuracy. With additional retraining, anchor pruning can even lead to improved accuracy. Extensive experiments on SSD and MS COCO show that the detection head can be made up to 44% more efficient while simultaneously increasing accuracy. Further experiments on RetinaNet and PASCAL VOC show the general effectiveness of our approach. We also introduce `overanchorized' models that can be used together with anchor pruning to eliminate hyperparameters related to the initial shape of anchors. Code and models are available at https://github.com/Mxbonn/anchor_pruning.
LGJan 31, 2021Code
PyTorch-Hebbian: facilitating local learning in a deep learning frameworkJules Talloen, Joni Dambre, Alexander Vandesompele
Recently, unsupervised local learning, based on Hebb's idea that change in synaptic efficacy depends on the activity of the pre- and postsynaptic neuron only, has shown potential as an alternative training mechanism to backpropagation. Unfortunately, Hebbian learning remains experimental and rarely makes it way into standard deep learning frameworks. In this work, we investigate the potential of Hebbian learning in the context of standard deep learning workflows. To this end, a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines is proposed. Using this framework, the potential of Hebbian learned feature extractors for image classification is illustrated. In particular, the framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy compared to end-to-end backpropagation. The source code is available at https://github.com/Joxis/pytorch-hebbian.
CLAug 28, 2018Code
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?Fréderic Godin, Kris Demuynck, Joni Dambre et al.
Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules. Our implementation can be found at https://github.com/FredericGodin/ContextualDecomposition-NLP .
CLFeb 27, 2025
Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language TechnologiesToon Vandendriessche, Mathieu De Coster, Annelies Lejon et al.
Isolated Sign Language Recognition (ISLR) is crucial for scalable sign language technology, yet language-specific approaches limit current models. To address this, we propose a one-shot learning approach that generalises across languages and evolving vocabularies. Our method involves pretraining a model to embed signs based on essential features and using a dense vector search for rapid, accurate recognition of unseen signs. We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language than the training set. Our approach is robust across languages and support sets, offering a scalable, adaptable solution for ISLR. Co-created with the Deaf and Hard of Hearing (DHH) community, this method aligns with real-world needs, and advances scalable sign language recognition.
HCFeb 12, 2025
Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMsTanguy Cazalets, Joni Dambre
This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI). This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations. By simulating complex human interactions, it assesses how LLMs interpret and align with human thought patterns during conversational exchanges, which are essential for effective social partnerships in HCI. Initial findings highlight the influence of model sophistication on performance, offering insights into the models capabilities to engage in meaningful social interactions and adapt behaviors in human-like ways. This research advances the understanding of LLMs potential to replicate or diverge from human cognitive functions, paving the way for more nuanced and empathetic human-machine collaborations.
CLFeb 7, 2022
Machine Translation from Signed to Spoken Languages: State of the Art and ChallengesMathieu De Coster, Dimitar Shterionov, Mieke Van Herreweghe et al.
Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular - the majority of scientific papers on the topic of sign language translation have been published in the past three years - we provide an overview of the state of the art as well as some required background in the different related disciplines. We give a high-level introduction to sign language linguistics and machine translation to illustrate the requirements of automatic sign language translation. We present a systematic literature review to illustrate the state of the art in the domain and then, harking back to the requirements, lay out several challenges for future research. We find that significant advances have been made on the shoulders of spoken language machine translation research. However, current approaches are often not linguistically motivated or are not adapted to the different input modality of sign languages. We explore challenges related to the representation of sign language data, the collection of datasets, the need for interdisciplinary research and requirements for moving beyond research, towards applications. Based on our findings, we advocate for interdisciplinary research and to base future research on linguistic analysis of sign languages. Furthermore, the inclusion of deaf and hearing end users of sign language translation applications in use case identification, data collection and evaluation is of the utmost importance in the creation of useful sign language translation models. We recommend iterative, human-in-the-loop, design and development of sign language translation models.
NEApr 9, 2020
Populations of Spiking Neurons for Reservoir Computing: Closed Loop Control of a Compliant QuadrupedAlexander Vandesompele, Gabriel Urbain, Francis wyffels et al.
Compliant robots can be more versatile than traditional robots, but their control is more complex. The dynamics of compliant bodies can however be turned into an advantage using the physical reservoir computing frame-work. By feeding sensor signals to the reservoir and extracting motor signals from the reservoir, closed loop robot control is possible. Here, we present a novel framework for implementing central pattern generators with spiking neural networks to obtain closed loop robot control. Using the FORCE learning paradigm, we train a reservoir of spiking neuron populations to act as a central pattern generator. We demonstrate the learning of predefined gait patterns, speed control and gait transition on a simulated model of a compliant quadrupedal robot.
ROMar 20, 2020
Stance Control Inspired by Cerebellum Stabilizes Reflex-Based Locomotion on HyQ RobotGabriel Urbain, Victor Barasuol, Claudio Semini et al.
Advances in legged robotics are strongly rooted in animal observations. A clear illustration of this claim is the generalization of Central Pattern Generators (CPG), first identified in the cat spinal cord, to generate cyclic motion in robotic locomotion. Despite a global endorsement of this model, physiological and functional experiments in mammals have also indicated the presence of descending signals from the cerebellum, and reflex feedback from the lower limb sensory cells, that closely interact with CPGs. To this day, these interactions are not fully understood. In some studies, it was demonstrated that pure reflex-based locomotion in the absence of oscillatory signals could be achieved in realistic musculoskeletal simulation models or small compliant quadruped robots. At the same time, biological evidence has attested the functional role of the cerebellum for predictive control of balance and stance within mammals. In this paper, we promote both approaches and successfully apply reflex-based dynamic locomotion, coupled with a balance and gravity compensation mechanism, on the state-of-art HyQ robot. We discuss the importance of this stability module to ensure a correct foot lift-off and maintain a reliable gait. The robotic platform is further used to test two different architectural hypotheses inspired by the cerebellum. An analysis of experimental results demonstrates that the most biologically plausible alternative also leads to better results for robust locomotion.
LGOct 25, 2019
Towards Deep Physical Reservoir Computing Through Automatic Task Decomposition And MappingMatthias Freiberger, Peter Bienstman, Joni Dambre
Photonic reservoir computing is a promising candidate for low-energy computing at high bandwidths. Despite recent successes, there are bounds to what one can achieve simply by making photonic reservoirs larger. Therefore, a switch from single-reservoir computing to multi-reservoir and even deep physical reservoir computing is desirable. Given that backpropagation can not be used directly to train multi-reservoir systems in our targeted setting, we propose an alternative approach that still uses its power to derive intermediate targets. In this work we report our findings on a conducted experiment to evaluate the general feasibility of our approach by training a network of 3 Echo State Networks to perform the well-known NARMA-10 task using targets derived through backpropagation. Our results indicate that our proposed method is well-suited to train multi-reservoir systems in a efficient way.
ETJun 6, 2019
Addressing Limited Weight Resolution in a Fully Optical Neuromorphic Reservoir Computing ReadoutChonghuai Ma, Floris Laporte, Joni Dambre et al.
Using optical hardware for neuromorphic computing has become more and more popular recently due to its efficient high-speed data processing capabilities and low power consumption. However, there are still some remaining obstacles to realizing the vision of a completely optical neuromorphic computer. One of them is that, depending on the technology used, optical weighting elements may not share the same resolution as in the electrical domain. Moreover, noise and drift are important considerations as well. In this article, we investigate a new method for improving the performance of optical weighting, even in the presence of noise and in the case of very low resolution. Even with only 8 to 32 levels of resolution, the method can outperform the naive traditional low-resolution weighting by several orders of magnitude in terms of bit error rate and can deliver performance very close to full-resolution weighting elements, also in noisy environments.
NEOct 8, 2018
Training Passive Photonic Reservoirs with Integrated Optical ReadoutMatthias Freiberger, Andrew Katumba, Peter Bienstman et al.
As Moore's law comes to an end, neuromorphic approaches to computing are on the rise. One of these, passive photonic reservoir computing, is a strong candidate for computing at high bitrates (> 10 Gbps) and with low energy consumption. Currently though, both benefits are limited by the necessity to perform training and readout operations in the electrical domain. Thus, efforts are currently underway in the photonic community to design an integrated optical readout, which allows to perform all operations in the optical domain. In addition to the technological challenge of designing such a readout, new algorithms have to be designed in order to train it. Foremost, suitable algorithms need to be able to deal with the fact that the actual on-chip reservoir states are not directly observable. In this work, we investigate several options for such a training algorithm and propose a solution in which the complex states of the reservoir can be observed by appropriately setting the readout weights, while iterating over a predefined input sequence. We perform numerical simulations in order to compare our method with an ideal baseline requiring full observability as well as with an established black-box optimization approach (CMA-ES).
MLFeb 21, 2018
BRUNO: A Deep Recurrent Model for Exchangeable DataIryna Korshunova, Jonas Degrave, Ferenc Huszár et al.
We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks that require generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, and anomaly detection.
CLJul 25, 2017
Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural NetworksFréderic Godin, Jonas Degrave, Joni Dambre et al.
In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs.
CLJul 19, 2017
Improving Language Modeling using Densely Connected Recurrent Neural NetworksFréderic Godin, Joni Dambre, Wesley De Neve
In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions.
CVNov 29, 2016
Fast Face-swap Using Convolutional Neural NetworksIryna Korshunova, Wenzhe Shi, Joni Dambre et al.
We consider the problem of face swapping in images, where an input identity is transformed into a target identity while preserving pose, facial expression, and lighting. To perform this mapping, we use convolutional neural networks trained to capture the appearance of the target identity from an unstructured collection of his/her photographs.This approach is enabled by framing the face swapping problem in terms of style transfer, where the goal is to render an image in the style of another one. Building on recent advances in this area, we devise a new loss function that enables the network to produce highly photorealistic results. By combining neural networks with simple pre- and post-processing steps, we aim at making face swap work in real-time with no input from the user.
NENov 5, 2016
A Differentiable Physics Engine for Deep Learning in RoboticsJonas Degrave, Michiel Hermans, Joni Dambre et al.
An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose the implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software.
CVJun 5, 2015
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in VideoLionel Pigou, Aäron van den Oord, Sander Dieleman et al.
Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results.
IMMar 24, 2015
Rotation-invariant convolutional neural networks for galaxy morphology predictionSander Dieleman, Kyle W. Willett, Joni Dambre
Measuring the morphological parameters of galaxies is a key requirement for studying their formation and evolution. Surveys such as the Sloan Digital Sky Survey (SDSS) have resulted in the availability of very large collections of images, which have permitted population-wide analyses of galaxy morphology. Morphological analysis has traditionally been carried out mostly via visual inspection by trained experts, which is time-consuming and does not scale to large ($\gtrsim10^4$) numbers of images. Although attempts have been made to build automated classification systems, these have not been able to achieve the desired level of accuracy. The Galaxy Zoo project successfully applied a crowdsourcing strategy, inviting online users to classify images by answering a series of questions. Unfortunately, even this approach does not scale well enough to keep up with the increasing availability of galaxy images. We present a deep neural network model for galaxy morphology classification which exploits translational and rotational symmetry. It was developed in the context of the Galaxy Challenge, an international competition to build the best model for morphology classification based on annotated images from the Galaxy Zoo project. For images with high agreement among the Galaxy Zoo participants, our model is able to reproduce their consensus with near-perfect accuracy ($> 99\%$) for most questions. Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation. This approach greatly reduces the experts' workload without affecting accuracy. The application of these algorithms to larger sets of training data will be critical for analysing results from future surveys such as the LSST.
NEJan 12, 2015
Photonic Delay Systems as Machine Learning ImplementationsMichiel Hermans, Miguel Soriano, Joni Dambre et al.
Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can be greatly extended if we use gradient descent with backpropagation through time on a model of the system to optimize the input encoding of such systems. We perform physical experiments that demonstrate that the obtained input encodings work well in reality, and we show that optimized systems perform significantly better than the common Reservoir Computing approach. The results presented here demonstrate that common gradient descent techniques from machine learning may well be applicable on physical neuro-inspired analog computers.
NEJul 24, 2014
Trainable and Dynamic Computing: Error Backpropagation through Physical MediaMichiel Hermans, Michaël Burm, Joni Dambre et al.
Machine learning algorithms, and more in particular neural networks, arguably experience a revolution in terms of performance. Currently, the best systems we have for speech recognition, computer vision and similar problems are based on neural networks, trained using the half-century old backpropagation algorithm. Despite the fact that neural networks are a form of analog computers, they are still implemented digitally for reasons of convenience and availability. In this paper we demonstrate how we can design physical linear dynamic systems with non-linear feedback as a generic platform for dynamic, neuro-inspired analog computing. We show that a crucial advantage of this setup is that the error backpropagation can be performed physically as well, which greatly speeds up the optimisation process. As we show in this paper, using one experimentally validated and one conceptual example, such systems may be the key to providing a relatively straightforward mechanism for constructing highly scalable, fully dynamic analog computers.
LGJun 9, 2014
Memristor models for machine learningJuan Pablo Carbajal, Joni Dambre, Michiel Hermans et al.
In the quest for alternatives to traditional CMOS, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area- and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In this work, we explore the use of memristor networks for analog approximate computation, based on a machine learning framework called reservoir computing. Most experimental investigations on the dynamics of memristors focus on their nonvolatile behavior. Hence, the volatility that is present in the developed technologies is usually unwanted and it is not included in simulation models. In contrast, in reservoir computing, volatility is not only desirable but necessary. Therefore, in this work, we propose two different ways to incorporate it into memristor simulation models. The first is an extension of Strukov's model and the second is an equivalent Wiener model approximation. We analyze and compare the dynamical properties of these models and discuss their implications for the memory and the nonlinear processing capacity of memristor networks. Our results indicate that device variability, increasingly causing problems in traditional computer design, is an asset in the context of reservoir computing. We conclude that, although both models could lead to useful memristor based reservoir computing systems, their computational performance will differ. Therefore, experimental modeling research is required for the development of accurate volatile memristor models.