CVDec 21, 2022
Deep set conditioned latent representations for action recognitionAkash Singh, Tom De Schepper, Kevin Mets et al.
In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset.
LGNov 16, 2023
Safety Aware Autonomous Path Planning Using Model Predictive Reinforcement Learning for Inland WaterwaysAstrid Vanneste, Simon Vanneste, Olivier Vasseur et al.
In recent years, interest in autonomous shipping in urban waterways has increased significantly due to the trend of keeping cars and trucks out of city centers. Classical approaches such as Frenet frame based planning and potential field navigation often require tuning of many configuration parameters and sometimes even require a different configuration depending on the situation. In this paper, we propose a novel path planning approach based on reinforcement learning called Model Predictive Reinforcement Learning (MPRL). MPRL calculates a series of waypoints for the vessel to follow. The environment is represented as an occupancy grid map, allowing us to deal with any shape of waterway and any number and shape of obstacles. We demonstrate our approach on two scenarios and compare the resulting path with path planning using a Frenet frame and path planning based on a proximal policy optimization (PPO) agent. Our results show that MPRL outperforms both baselines in both test scenarios. The PPO based approach was not able to reach the goal in either scenario while the Frenet frame approach failed in the scenario consisting of a corner with obstacles. MPRL was able to safely (collision free) navigate to the goal in both of the test scenarios.
LGApr 12, 2022
An Analysis of Discretization Methods for Communication Learning with Multi-Agent Reinforcement LearningAstrid Vanneste, Simon Vanneste, Kevin Mets et al.
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as two methods that have not been used for communication learning before. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. Our results show that none of the methods is best in all environments. The best choice in discretization method greatly depends on the environment. However, the discretize regularize unit (DRU), straight through DRU and the straight through gumbel softmax show the most consistent results across all the tested environments. Therefore, these methods prove to be the best choice for general use while the straight through estimator and the gumbel softmax may provide better results in specific environments but fail completely in others.
LGAug 9, 2023
An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement LearningAstrid Vanneste, Simon Vanneste, Kevin Mets et al.
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
LGAug 9, 2023
Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement LearningAstrid Vanneste, Thomas Somers, Simon Vanneste et al.
Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.
CVAug 14, 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch ApproachMuhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer et al.
Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities. It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance as well as robustness to missing modalities. Extensive experiments are performed on four challenging datasets including textual-visual (UPMC Food-101, Hateful Memes, Ferramenta) and audio-visual modalities (VoxCeleb1). Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods.
AINov 15, 2022
Structured Exploration Through Instruction Enhancement for Object NavigationMatthias Hutsebaut-Buysse, Kevin Mets, Tom De Schepper et al.
Finding an object of a specific class in an unseen environment remains an unsolved navigation problem. Hence, we propose a hierarchical learning-based method for object navigation. The top-level is capable of high-level planning, and building a memory on a floorplan-level (e.g., which room makes the most sense for the agent to visit next, where has the agent already been?). While the lower-level is tasked with efficiently navigating between rooms and looking for objects in them. Instructions can be provided to the agent using a simple synthetic language. The top-level intelligently enhances the instructions in order to make the overall task more tractable. Language grounding, mapping instructions to visual observations, is performed by utilizing an additional separate supervised trained goal assessment module. We demonstrate the effectiveness of our method on a dynamic configurable domestic environment.
CVJul 23, 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing ModalitiesMuhammad Irzam Liaqat, Shah Nawaz, Muhammad Zaigham Zaheer et al.
Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific streams making the models reliant on the availability of a complete set of modalities. In this work, we propose a robust textual-visual multimodal learning method, Chameleon, that completely deviates from the conventional multi-branch design. To enable this, we present the unification of input modalities into one format by encoding textual modality into visual representations. As a result, our approach does not require modality-specific branches to learn modality-independent multimodal representations making it robust to missing modalities. Extensive experiments are performed on four popular challenging datasets including Hateful Memes, UPMC Food-101, MM-IMDb, and Ferramenta. Chameleon not only achieves superior performance when all modalities are present at train/test time but also demonstrates notable resilience in the case of missing modalities.
LGDec 19, 2023
Inferring the relationship between soil temperature and the normalized difference vegetation index with machine learningSteven Mortier, Amir Hamedpour, Bart Bussmann et al.
Changes in climate can greatly affect the phenology of plants, which can have important feedback effects, such as altering the carbon cycle. These phenological feedback effects are often induced by a shift in the start or end dates of the growing season of plants. The normalized difference vegetation index (NDVI) serves as a straightforward indicator for assessing the presence of green vegetation and can also provide an estimation of the plants' growing season. In this study, we investigated the effect of soil temperature on the timing of the start of the season (SOS), timing of the peak of the season (POS), and the maximum annual NDVI value (PEAK) in subarctic grassland ecosystems between 2014 and 2019. We also explored the impact of other meteorological variables, including air temperature, precipitation, and irradiance, on the inter-annual variation in vegetation phenology. Using machine learning (ML) techniques and SHapley Additive exPlanations (SHAP) values, we analyzed the relative importance and contribution of each variable to the phenological predictions. Our results reveal a significant relationship between soil temperature and SOS and POS, indicating that higher soil temperatures lead to an earlier start and peak of the growing season. However, the Peak NDVI values showed just a slight increase with higher soil temperatures. The analysis of other meteorological variables demonstrated their impacts on the inter-annual variation of the vegetation phenology. Ultimately, this study contributes to our knowledge of the relationships between soil temperature, meteorological variables, and vegetation phenology, providing valuable insights for predicting vegetation phenology characteristics and managing subarctic grasslands in the face of climate change. Additionally, this work provides a solid foundation for future ML-based vegetation phenology studies.
CVJan 31, 2024
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action RecognitionWei Wei, Tom De Schepper, Kevin Mets
Continual learning (CL) is the research field that aims to build machine learning models that can accumulate knowledge continuously over different tasks without retraining from scratch. Previous studies have shown that pre-training graph neural networks (GNN) may lead to negative transfer (Hu et al., 2020) after fine-tuning, a setting which is closely related to CL. Thus, we focus on studying GNN in the continual graph learning (CGL) setting. We propose the first continual graph learning benchmark for spatio-temporal graphs and use it to benchmark well-known CGL methods in this novel setting. The benchmark is based on the N-UCLA and NTU-RGB+D datasets for skeleton-based action recognition. Beyond benchmarking for standard performance metrics, we study the class and task-order sensitivity of CGL methods, i.e., the impact of learning order on each class/task's performance, and the architectural sensitivity of CGL methods with backbone GNN at various widths and depths. We reveal that task-order robust methods can still be class-order sensitive and observe results that contradict previous empirical observations on architectural sensitivity in CL.
CVSep 29, 2025
On-the-Fly Data Augmentation for Brain Tumor SegmentationIshika Jain, Siri Willems, Steven Latre et al.
Robust segmentation across both pre-treatment and post-treatment glioma scans can be helpful for consistent tumor monitoring and treatment planning. BraTS 2025 Task 1 addresses this by challenging models to generalize across varying tumor appearances throughout the treatment timeline. However, training such generalized models requires access to diverse, high-quality annotated data, which is often limited. While data augmentation can alleviate this, storing large volumes of augmented 3D data is computationally expensive. To address these challenges, we propose an on-the-fly augmentation strategy that dynamically inserts synthetic tumors using pretrained generative adversarial networks (GliGANs) during training. We evaluate three nnU-Net-based models and their ensembles: (1) a baseline without external augmentation, (2) a regular on-the-fly augmented model, and (3) a model with customized on-the-fly augmentation. Built upon the nnU-Net framework, our pipeline leverages pretrained GliGAN weights and tumor insertion methods from prior challenge-winning solutions. An ensemble of the three models achieves lesion-wise Dice scores of 0.79 (ET), 0.749 (NETC), 0.872 (RC), 0.825 (SNFH), 0.79 (TC), and 0.88 (WT) on the online BraTS 2025 validation platform. This work ranked first in the BraTS Lighthouse Challenge 2025 Task 1- Adult Glioma Segmentation.
CVSep 27, 2025
Increasing the Diversity in RGB-to-Thermal Image Translation for Automotive ApplicationsKaili Wang, Leonardo Ravaglia, Roberto Longo et al.
Thermal imaging in Advanced Driver Assistance Systems (ADAS) improves road safety with superior perception in low-light and harsh weather conditions compared to traditional RGB cameras. However, research in this area faces challenges due to limited dataset availability and poor representation in driving simulators. RGB-to-thermal image translation offers a potential solution, but existing methods focus on one-to-one mappings. We propose a one-to-many mapping using a multi-modal translation framework enhanced with our Component-aware Adaptive Instance Normalization (CoAdaIN). Unlike the original AdaIN, which applies styles globally, CoAdaIN adapts styles to different image components individually. The result, as we show, is more realistic and diverse thermal image translations. This is the accepted author manuscript of the paper published in IEEE Sensors Conference 2024. The final published version is available at 10.1109/SENSORS60989.2024.10785056.
LGJun 14, 2024
Dataset Condensation with Latent Quantile MatchingWei Wei, Tom De Schepper, Kevin Mets
Dataset condensation (DC) methods aim to learn a smaller synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However two distributions with the same mean can still be vastly different. In this work we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions i.e. the weak matching power and lack of outlier regularization. To alleviate these shortcomings we propose our new method: Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover we show that LQM improves the performance in continual graph learning (CGL) setting where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.
LGJun 12, 2020
Learning to Communicate Using Counterfactual ReasoningSimon Vanneste, Astrid Vanneste, Kevin Mets et al.
Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the creation of influenceable agents are major challenges within this research field which need to be overcome in order to learn a valid communication protocol. This paper introduces the novel multi-agent counterfactual communication learning (MACC) method which adapts counterfactual reasoning in order to overcome the credit assignment problem for communicating agents. Secondly, the non-stationarity of the communication environment while learning the communication Q-function is overcome by creating the communication Q-function using the action policy of the other agents and the Q-function of the action environment. Additionally, a social loss function is introduced in order to create influenceable agents which is required to learn a valid communication protocol. Our experiments show that MACC is able to outperform the state-of-the-art baselines in four different scenarios in the Particle environment.