Justin Dauwels

LG
Semantic Scholar Profile
h-index58
33papers
257citations
Novelty47%
AI Score56

33 Papers

ROFeb 23, 2023Code
PEM: Perception Error Model for Virtual Testing of Autonomous Vehicles

Andrea Piazzoni, Jim Cherian, Justin Dauwels et al.

Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that can enable the analysis of the impact of perception errors on AV safety, without the need to model the sensors themselves. We propose a generalized data-driven procedure towards parametric modeling and evaluate it using Apollo, an open-source driving software, and nuScenes, a public AV dataset. Additionally, we implement PEMs in SVL, an open-source vehicle simulator. Furthermore, we demonstrate the usefulness of PEM-based virtual tests, by evaluating camera, LiDAR, and camera-LiDAR setups. Our virtual tests highlight limitations in the current evaluation metrics, and the proposed approach can help study the impact of perception errors on AV safety.

LGSep 13, 2022
Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning

Rongkai Zhang, Cong Zhang, Zhiguang Cao et al.

We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-routing tasks by assigning customers to each vehicle via a Graph Isomorphism Network (GIN) based policy network. A worker agent learns to solve sub-routing tasks by minimizing the cost in terms of both tour length and rejection rate for each vehicle, the maximum of which is then fed back to the manager agent to learn better assignments. Experimental results demonstrate that the proposed framework outperforms strong baselines in terms of higher solution quality and shorter computation time. More importantly, the trained agents also achieve competitive performance for solving unseen larger instances.

LGApr 8, 2023
TC-VAE: Uncovering Out-of-Distribution Data Generative Factors

Cristian Meo, Anirudh Goyal, Justin Dauwels · mila

Uncovering data generative factors is the ultimate goal of disentanglement learning. Although many works proposed disentangling generative models able to uncover the underlying generative factors of a dataset, so far no one was able to uncover OOD generative factors (i.e., factors of variations that are not explicitly shown on the dataset). Moreover, the datasets used to validate these models are synthetically generated using a balanced mixture of some predefined generative factors, implicitly assuming that generative factors are uniformly distributed across the datasets. However, real datasets do not present this property. In this work we analyse the effect of using datasets with unbalanced generative factors, providing qualitative and quantitative results for widely used generative models. Moreover, we propose TC-VAE, a generative model optimized using a lower bound of the joint total correlation between the learned latent representations and the input data. We show that the proposed model is able to uncover OOD generative factors on different datasets and outperforms on average the related baselines in terms of downstream disentanglement metrics.

CVJun 12, 2023
Slot-VAE: Object-Centric Scene Generation with Slot Attention

Yanbo Wang, Letao Liu, Justin Dauwels

Slot attention has shown remarkable object-centric representation learning performance in computer vision tasks without requiring any supervision. Despite its object-centric binding ability brought by compositional modelling, as a deterministic module, slot attention lacks the ability to generate novel scenes. In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. For each image, the model simultaneously infers a global scene representation to capture high-level scene structure and object-centric slot representations to embed individual object components. During generation, slot representations are generated from the global scene representation to ensure coherent scene structures. Our extensive evaluation of the scene generation ability indicates that Slot-VAE outperforms slot representation-based generative baselines in terms of sample quality and scene structure accuracy.

ASMar 7, 2022
Enhance Language Identification using Dual-mode Model with Knowledge Distillation

Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong et al.

In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances. The dual-mode XSA-LID model is trained by jointly optimizing both the full and short modes with their respective inputs being the full-length speech and its short clip extracted by a specific Boolean mask, and KD is applied to further boost the performance on short utterances. In addition, we investigate the impact of clip-wise linguistic variability and lexical integrity for LID by analyzing the variation of LID performance in terms of the lengths and positions of the mimicked speech clips. We evaluated our approach on the MLS14 data from the NIST 2017 LRE. With the 3~s random-location Boolean mask, our proposed method achieved 19.23%, 21.52% and 8.37% relative improvement in average cost compared with the XSA-LID model on 3s, 10s, and 30s speech, respectively.

RONov 21, 2022
CoPEM: Cooperative Perception Error Models for Autonomous Driving

Andrea Piazzoni, Jim Cherian, Roshan Vijay et al.

In this paper, we introduce the notion of Cooperative Perception Error Models (coPEMs) towards achieving an effective and efficient integration of V2X solutions within a virtual test environment. We focus our analysis on the occlusion problem in the (onboard) perception of Autonomous Vehicles (AV), which can manifest as misdetection errors on the occluded objects. Cooperative perception (CP) solutions based on Vehicle-to-Everything (V2X) communications aim to avoid such issues by cooperatively leveraging additional points of view for the world around the AV. This approach usually requires many sensors, mainly cameras and LiDARs, to be deployed simultaneously in the environment either as part of the road infrastructure or on other traffic vehicles. However, implementing a large number of sensor models in a virtual simulation pipeline is often prohibitively computationally expensive. Therefore, in this paper, we rely on extending Perception Error Models (PEMs) to efficiently implement such cooperative perception solutions along with the errors and uncertainties associated with them. We demonstrate the approach by comparing the safety achievable by an AV challenged with a traffic scenario where occlusion is the primary cause of a potential collision.

SPMay 19
Classification of IED-free EEG Responses for Assisted Epilepsy Diagnosis

Giacomo Zanardini, Ryan Moesman, Paul van der Kleij et al.

Diagnosing epilepsy is challenging when routine EEGs lack interictal epileptiform discharges (IEDs). Intermittent photic stimulation (IPS) and hyperventilation (HV) can increase diagnostic yield, but their interpretation is subjective. We propose a reproducible pipeline that classifies EEG recordings acquired during stimulation procedures, using machine-learning features spanning temporal, spectral, wavelet, and connectivity domains, and a stacked ensemble to combine complementary feature sets. Performance is evaluated with leave-one-subject-out (LOSO) cross-validation on the TUH Epilepsy Corpus and a clinical Erasmus MC (EMC) cohort, including IED-free analyses on TUH. On TUH, ensembles achieve up to 97.8\% AUC / 93.1\% BAC on IED-free resting-state EEG and 94.1\% AUC / 86.8\% BAC on IED-free IPS. On EMC, IPS provides the strongest discrimination (79.4\% AUC / 73.9\% BAC), while HV performance benefits from stratifying subjects by responsiveness. These results indicate that stimulation-evoked activity, particularly IPS, contains meaningful discriminative information for IED-free epilepsy classification and that multi-domain ensembling improves robustness.

LGMar 6, 2024Code
Extreme Precipitation Nowcasting using Transformer-based Generative Models

Cristian Meo, Ankush Roy, Mircea Lică et al.

This paper presents an innovative approach to extreme precipitation nowcasting by employing Transformer-based generative models, namely NowcastingGPT with Extreme Value Loss (EVL) regularization. Leveraging a comprehensive dataset from the Royal Netherlands Meteorological Institute (KNMI), our study focuses on predicting short-term precipitation with high accuracy. We introduce a novel method for computing EVL without assuming fixed extreme representations, addressing the limitations of current models in capturing extreme weather events. We present both qualitative and quantitative analyses, demonstrating the superior performance of the proposed NowcastingGPT-EVL in generating accurate precipitation forecasts, especially when dealing with extreme precipitation events. The code is available at \url{https://github.com/Cmeo97/NowcastingGPT}.

LGMay 15
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control

Thomas Evers, Cristian Meo, Wendelin Bohmer et al.

We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from a higher update-to-data (UTD) ratio, further improving sample efficiency. To the best of our knowledge, in the low data regime of each benchmark, EfficientTDMPC achieves state-of-the-art (SOTA) in terms of sample efficiency on HumanoidBench-Hard and DMC hard, while matching SOTA on DMC easy.

CVJan 22
Assessing Situational and Spatial Awareness of VLMs with Synthetically Generated Video

Pascal Benschop, Justin Dauwels, Jan van Gemert

Spatial reasoning in vision language models (VLMs) remains fragile when semantics hinge on subtle temporal or geometric cues. We introduce a synthetic benchmark that probes two complementary skills: situational awareness (recognizing whether an interaction is harmful or benign) and spatial awareness (tracking who does what to whom, and reasoning about relative positions and motion). Through minimal video pairs, we test three challenges: distinguishing violence from benign activity, binding assailant roles across viewpoints, and judging fine-grained trajectory alignment. While we evaluate recent VLMs in a training-free setting, the benchmark is applicable to any video classification model. Results show performance only slightly above chance across tasks. A simple aid, stable color cues, partly reduces assailant role confusions but does not resolve the underlying weakness. By releasing data and code, we aim to provide reproducible diagnostics and seed exploration of lightweight spatial priors to complement large-scale pretraining.

CVOct 27, 2025Code
Evaluation of Vision-LLMs in Surveillance Video

Pascal Benschop, Cristian Meo, Justin Dauwels et al.

The widespread use of cameras in our society has created an overwhelming amount of video data, far exceeding the capacity for human monitoring. This presents a critical challenge for public safety and security, as the timely detection of anomalous or criminal events is crucial for effective response and prevention. The ability for an embodied agent to recognize unexpected events is fundamentally tied to its capacity for spatial reasoning. This paper investigates the spatial reasoning of vision-language models (VLMs) by framing anomalous action recognition as a zero-shot, language-grounded task, addressing the embodied perception challenge of interpreting dynamic 3D scenes from sparse 2D video. Specifically, we investigate whether small, pre-trained vision--LLMs can act as spatially-grounded, zero-shot anomaly detectors by converting video into text descriptions and scoring labels via textual entailment. We evaluate four open models on UCF-Crime and RWF-2000 under prompting and privacy-preserving conditions. Few-shot exemplars can improve accuracy for some models, but may increase false positives, and privacy filters -- especially full-body GAN transforms -- introduce inconsistencies that degrade accuracy. These results chart where current vision--LLMs succeed (simple, spatially salient events) and where they falter (noisy spatial cues, identity obfuscation). Looking forward, we outline concrete paths to strengthen spatial grounding without task-specific training: structure-aware prompts, lightweight spatial memory across clips, scene-graph or 3D-pose priors during description, and privacy methods that preserve action-relevant geometry. This positions zero-shot, language-grounded pipelines as adaptable building blocks for embodied, real-world video understanding. Our implementation for evaluating VLMs is publicly available at: https://github.com/pascalbenschopTU/VLLM_AnomalyRecognition

LGOct 15, 2025Code
Assessing the Geographic Generalization and Physical Consistency of Generative Models for Climate Downscaling

Carlo Saccardi, Maximilian Pierzyna, Haitz Sáez de Ocáriz Borde et al.

Kilometer-scale weather data is crucial for real-world applications but remains computationally intensive to produce using traditional weather simulations. An emerging solution is to use deep learning models, which offer a faster alternative for climate downscaling. However, their reliability is still in question, as they are often evaluated using standard machine learning metrics rather than insights from atmospheric and weather physics. This paper benchmarks recent state-of-the-art deep learning models and introduces physics-inspired diagnostics to evaluate their performance and reliability, with a particular focus on geographic generalization and physical consistency. Our experiments show that, despite the seemingly strong performance of models such as CorrDiff, when trained on a limited set of European geographies (e.g., central Europe), they struggle to generalize to other regions such as Iberia, Morocco in the south, or Scandinavia in the north. They also fail to accurately capture second-order variables such as divergence and vorticity derived from predicted velocity fields. These deficiencies appear even in in-distribution geographies, indicating challenges in producing physically consistent predictions. We propose a simple initial solution: introducing a power spectral density loss function that empirically improves geographic generalization by encouraging the reconstruction of small-scale physical structures. The code for reproducing the experimental results can be found at https://github.com/CarloSaccardi/PSD-Downscaling

LGFeb 17
Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

Davide Casnici, Martin Lefebvre, Justin Dauwels et al.

Predictive coding (PC) is a biologically inspired algorithm for training neural networks that relies only on local updates, allowing parallel learning across layers. However, practical implementations face two key limitations: error signals must still propagate from the output to early layers through multiple inference-phase steps, and feedback decays exponentially during this process, leading to vanishing updates in early layers. We propose direct Kolen-Pollack predictive coding (DKP-PC), which simultaneously addresses both feedback delay and exponential decay, yielding a more efficient and scalable variant of PC while preserving update locality. Leveraging direct feedback alignment and direct Kolen-Pollack algorithms, DKP-PC introduces learnable feedback connections from the output layer to all hidden layers, establishing a direct pathway for error transmission. This yields an algorithm that reduces the theoretical error propagation time complexity from O(L), with L being the network depth, to O(1), removing depth-dependent delay in error signals. Moreover, empirical results demonstrate that DKP-PC achieves performance at least comparable to, and often exceeding, that of standard PC, while offering improved latency and computational performance, supporting its potential for custom hardware-efficient implementations.

LGNov 1, 2024
$α$-TCVAE: On the relationship between Disentanglement and Diversity

Cristian Meo, Louis Mahon, Anirudh Goyal et al.

While disentangled representations have shown promise in generative modeling and representation learning, their downstream usefulness remains debated. Recent studies re-defined disentanglement through a formal connection to symmetries, emphasizing the ability to reduce latent domains and consequently enhance generative capabilities. However, from an information theory viewpoint, assigning a complex attribute to a specific latent variable may be infeasible, limiting the applicability of disentangled representations to simple datasets. In this work, we introduce $α$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound that maximizes disentanglement and latent variables informativeness. The proposed TC bound is grounded in information theory constructs, generalizes the $β$-VAE lower bound, and can be reduced to a convex combination of the known variational information bottleneck (VIB) and conditional entropy bottleneck (CEB) terms. Moreover, we present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity. Additionally, we perform downstream task experiments from both representation and RL domains to assess our questions from a broader ML perspective. Our results demonstrate that $α$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations without sacrificing visual fidelity. Notably, $α$-TCVAE exhibits marked improvements on MPI3D-Real, the most realistic disentangled dataset in our study, confirming its ability to represent complex datasets when maximizing the informativeness of individual variables. Finally, testing the proposed model off-the-shelf on a state-of-the-art model-based RL agent, Director, significantly shows $α$-TCVAE downstream usefulness on the loconav Ant Maze task.

CVOct 21, 2024
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Cristian Meo, Akihiro Nakano, Mircea Lică et al.

Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that pretrained Vision Transformers (ViTs) can be useful to learn object-centric representations on real-world video datasets. However, while these approaches succeed at extracting objects from the scenes, the slot-based representations fail to maintain temporal consistency across consecutive frames in a video, i.e. the mapping of objects to slots changes across the video. To address this, we introduce Conditional Autoregressive Slot Attention (CA-SA), a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks. Leveraging an autoregressive prior network to condition representations on previous timesteps and a novel consistency loss function, CA-SA predicts future slot representations and imposes consistency across frames. We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks, such as video prediction and visual question-answering tasks.

LGOct 7, 2025
BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression

Cristian Meo, Varun Sarathchandran, Avijit Majhi et al.

Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.

MLAug 12, 2025
Bio-Inspired Artificial Neural Networks based on Predictive Coding

Davide Casnici, Charlotte Frenkel, Justin Dauwels

Backpropagation (BP) of errors is the backbone training algorithm for artificial neural networks (ANNs). It updates network weights through gradient descent to minimize a loss function representing the mismatch between predictions and desired outputs. BP uses the chain rule to propagate the loss gradient backward through the network hierarchy, allowing efficient weight updates. However, this process requires weight updates at every layer to rely on a global error signal generated at the network's output. In contrast, the Hebbian model of synaptic plasticity states that weight updates are local, depending only on the activity of pre- and post-synaptic neurons. This suggests biological brains likely do not implement BP directly. Recently, Predictive Coding (PC) has gained interest as a biologically plausible alternative that updates weights using only local information. Originating from 1950s work on signal compression, PC was later proposed as a model of the visual cortex and formalized under the free energy principle, linking it to Bayesian inference and dynamical systems. PC weight updates rely solely on local information and provide theoretical advantages such as automatic scaling of gradients based on uncertainty. This lecture notes column offers a novel, tutorial-style introduction to PC, focusing on its formulation, derivation, and connections to well-known optimization and signal processing algorithms such as BP and the Kalman Filter (KF). It aims to support existing literature by guiding readers from the mathematical foundations of PC to practical implementation, including Python examples using PyTorch.

CVMay 27, 2025
Compositional Scene Understanding through Inverse Generative Modeling

Yanbo Wang, Justin Dauwels, Yilun Du

Generative models have demonstrated remarkable abilities in generating high-fidelity visual content. In this work, we explore how generative models can further be used not only to synthesize visual content but also to understand the properties of a scene given a natural image. We formulate scene understanding as an inverse generative modeling problem, where we seek to find conditional parameters of a visual generative model to best fit a given natural image. To enable this procedure to infer scene structure from images substantially different than those seen during training, we further propose to build this visual generative model compositionally from smaller models over pieces of a scene. We illustrate how this procedure enables us to infer the set of objects in a scene, enabling robust generalization to new test scenes with an increased number of objects of new shapes. We further illustrate how this enables us to infer global scene factors, likewise enabling robust generalization to new scenes. Finally, we illustrate how this approach can be directly applied to existing pretrained text-to-image generative models for zero-shot multi-object perception. Code and visualizations are at https://energy-based-model.github.io/compositional-inference.

AIJun 18, 2024
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Cristian Meo, Ksenia Sycheva, Anirudh Goyal et al.

It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.

LGJun 14, 2024
Precipitation Nowcasting Using Physics Informed Discriminator Generative Models

Junzhe Yin, Cristian Meo, Ankush Roy et al.

Nowcasting leverages real-time atmospheric conditions to forecast weather over short periods. State-of-the-art models, including PySTEPS, encounter difficulties in accurately forecasting extreme weather events because of their unpredictable distribution patterns. In this study, we design a physics-informed neural network to perform precipitation nowcasting using the precipitation and meteorological data from the Royal Netherlands Meteorological Institute (KNMI). This model draws inspiration from the novel Physics-Informed Discriminator GAN (PID-GAN) formulation, directly integrating physics-based supervision within the adversarial learning framework. The proposed model adopts a GAN structure, featuring a Vector Quantization Generative Adversarial Network (VQ-GAN) and a Transformer as the generator, with a temporal discriminator serving as the discriminator. Our findings demonstrate that the PID-GAN model outperforms numerical and SOTA deep generative models in terms of precipitation nowcasting downstream metrics.

ASMay 30, 2023
Investigating model performance in language identification: beyond simple error statistics

Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon et al.

Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may therefore mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. In the current paper, we investigate how well a number of language identification systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge. The Challenge dataset features accented English-Mandarin code-switched child-directed speech.

IVJul 12, 2021
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery

Rongkai Zhang, Jiang Zhu, Zhiyuan Zha et al.

State-of-the-art image denoisers exploit various types of deep neural networks via deterministic training. Alternatively, very recent works utilize deep reinforcement learning for restoring images with diverse or unknown corruptions. Though deep reinforcement learning can generate effective policy networks for operator selection or architecture search in image restoration, how it is connected to the classic deterministic training in solving inverse problems remains unclear. In this work, we propose a novel image denoising scheme via Residual Recovery using Reinforcement Learning, dubbed R3L. We show that R3L is equivalent to a deep recurrent neural network that is trained using a stochastic reward, in contrast to many popular denoisers using supervised learning with deterministic losses. To benchmark the effectiveness of reinforcement learning in R3L, we train a recurrent neural network with the same architecture for residual recovery using the deterministic loss, thus to analyze how the two different training strategies affect the denoising performance. With such a unified benchmarking system, we demonstrate that the proposed R3L has better generalizability and robustness in image denoising when the estimated noise level varies, comparing to its counterparts using deterministic training, as well as various state-of-the-art image denoising algorithms.

MLSep 16, 2020
Efficient Variational Bayes Learning of Graphical Models with Smooth Structural Changes

Hang Yu, Songwei Wu, Justin Dauwels

Estimating time-varying graphical models are of paramount importance in various social, financial, biological, and engineering systems, since the evolution of such networks can be utilized for example to spot trends, detect anomalies, predict vulnerability, and evaluate the impact of interventions. Existing methods require extensive tuning of parameters that control the graph sparsity and temporal smoothness. Furthermore, these methods are computationally burdensome with time complexity $O(NP^3)$ for $P$ variables and $N$ time points. As a remedy, we propose a low-complexity tuning-free Bayesian approach, named BASS. Specifically, we impose temporally-dependent spike-and-slab priors on the graphs such that they are sparse and varying smoothly across time. A variational inference algorithm is then derived to learn the graph structures from the data automatically. Owning to the pseudo-likelihood and the mean-field approximation, the time complexity of BASS is only $O(NP^2)$. Additionally, by identifying the frequency-domain resemblance to the time-varying graphical models, we show that BASS can be extended to learning frequency-varying inverse spectral density matrices, and yields graphical models for multivariate stationary time series. Numerical results on both synthetic and real data show that that BASS can better recover the underlying true graphs, while being more efficient than the existing methods, especially for high-dimensional cases.

MLAug 31, 2020
On the Quality Requirements of Demand Prediction for Dynamic Public Transport

Inon Peled, Kelvin Lee, Yu Jiang et al.

As Public Transport (PT) becomes more dynamic and demand-responsive, it increasingly depends on predictions of transport demand. But how accurate need such predictions be for effective PT operation? We address this question through an experimental case study of PT trips in Metropolitan Copenhagen, Denmark, which we conduct independently of any specific prediction models. First, we simulate errors in demand prediction through unbiased noise distributions that vary considerably in shape. Using the noisy predictions, we then simulate and optimize demand-responsive PT fleets via a linear programming formulation and measure their performance. Our results suggest that the optimized performance is mainly affected by the skew of the noise distribution and the presence of infrequently large prediction errors. In particular, the optimized performance can improve under non-Gaussian vs. Gaussian noise. We also find that dynamic routing could reduce trip time by at least 23% vs. static routing. This reduction is estimated at 809,000 EUR/year in terms of Value of Travel Time Savings for the case study.

AIJan 31, 2020
Modeling Perception Errors towards Robust Decision Making in Autonomous Vehicles

Andrea Piazzoni, Jim Cherian, Martin Slavik et al.

Sensing and Perception (S&P) is a crucial component of an autonomous system (such as a robot), especially when deployed in highly dynamic environments where it is required to react to unexpected situations. This is particularly true in case of Autonomous Vehicles (AVs) driving on public roads. However, the current evaluation metrics for perception algorithms are typically designed to measure their accuracy per se and do not account for their impact on the decision making subsystem(s). This limitation does not help developers and third party evaluators to answer a critical question: is the performance of a perception subsystem sufficient for the decision making subsystem to make robust, safe decisions? In this paper, we propose a simulation-based methodology towards answering this question. At the same time, we show how to analyze the impact of different kinds of sensing and perception errors on the behavior of the autonomous system.

LGNov 9, 2019
Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Satyajit Neogi, Justin Dauwels

Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.

CVJul 27, 2019
Context Model for Pedestrian Intention Prediction using Factored Latent-Dynamic Conditional Random Fields

Satyajit Neogi, Michael Hoy, Kang Dang et al.

Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context variables such as traffic lights, zebra crossings etc. We stress on the necessity of early prediction for smooth operation of such systems. We introduce the influence of vehicle interactions on pedestrian intention for this purpose. In this paper, we show a discernible advance in prediction time aided by the inclusion of such vehicle interaction context. We apply our methods to two different datasets, one in-house collected - NTU dataset and another public real-life benchmark - JAAD dataset. We also propose a generic graphical model Factored Latent-Dynamic Conditional Random Fields (FLDCRF) for single and multi-label sequence prediction as well as joint interaction modeling tasks. FLDCRF outperforms Long Short-Term Memory (LSTM) networks across the datasets ($\sim$100 sequences per dataset) over identical time-series features. While the existing best system predicts pedestrian stopping behaviour with 70\% accuracy 0.38 seconds before the actual events, our system achieves such accuracy at least 0.9 seconds on an average before the actual events across datasets.

CVJul 6, 2019
Affine Disentangled GAN for Interpretable and Robust AV Perception

Letao Liu, Martin Saerbeck, Justin Dauwels

Autonomous vehicles (AV) have progressed rapidly with the advancements in computer vision algorithms. The deep convolutional neural network as the main contributor to this advancement has boosted the classification accuracy dramatically. However, the discovery of adversarial examples reveals the generalization gap between dataset and the real world. Furthermore, affine transformations may also confuse computer vision based object detectors. The degradation of the perception system is undesirable for safety critical systems such as autonomous vehicles. In this paper, a deep learning system is proposed: Affine Disentangled GAN (ADIS-GAN), which is robust against affine transformations and adversarial attacks. It is demonstrated that conventional data augmentation for affine transformation and adversarial attacks are orthogonal, while ADIS-GAN can handle both attacks at the same time. Useful information such as image rotation angle and scaling factor are also generated in ADIS-GAN. On MNIST dataset, ADIS-GAN can achieve over 98 percent classification accuracy within 30 degrees rotation, and over 90 percent classification accuracy against FGSM and PGD adversarial attack.

MLFeb 26, 2019
Online Predictive Optimization Framework for Stochastic Demand-Responsive Transit Services

Inon Peled, Kelvin Lee, Yu Jiang et al.

This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of movement counts between each pair of serviced locations. The framework then combines these marginals into a joint demand distribution by constructing a Gaussian copula, which captures the structure of correlation between the marginals. For supply optimization, we devise a linear programming model, which simultaneously determines the route structure and the service frequency according to the predicted demand. Importantly, our framework both preserves the uncertainty structure of future demand and leverages this for robust route optimization, while keeping both components decoupled. We evaluate our framework using a real-world case study of autonomous mobility in a university campus in Denmark. The results show that our framework often obtains the ground truth optimal solution, and can outperform conventional methods for route optimization, which do not leverage full predictive distributions.

CVJul 23, 2018
Actor-Action Semantic Segmentation with Region Masks

Kang Dang, Chunluan Zhou, Zhigang Tu et al.

In this paper, we study the actor-action semantic segmentation problem, which requires joint labeling of both actor and action categories in video frames. One major challenge for this task is that when an actor performs an action, different body parts of the actor provide different types of cues for the action category and may receive inconsistent action labeling when they are labeled independently. To address this issue, we propose an end-to-end region-based actor-action segmentation approach which relies on region masks from an instance segmentation algorithm. Our main novelty is to avoid labeling pixels in a region mask independently - instead we assign a single action label to these pixels to achieve consistent action labeling. When a pixel belongs to multiple region masks, max pooling is applied to resolve labeling conflicts. Our approach uses a two-stream network as the front-end (which learns features capturing both appearance and motion information), and uses two region-based segmentation networks as the back-end (which takes the fused features from the two-stream network as the input and predicts actor-action labeling). Experiments on the A2D dataset demonstrate that both the region-based segmentation strategy and the fused features from the two-stream network contribute to the performance improvements. The proposed approach outperforms the state-of-the-art results by more than 8% in mean class accuracy, and more than 5% in mean class IOU, which validates its effectiveness.

AIJun 29, 2018
Multi-atomic Annealing Heuristic for Static Dial-a-ride Problem

Song Guang Ho, Ramesh Ramasamy Pandi, Sarat Chandra Nagavarapu et al.

Dial-a-ride problem (DARP) deals with the transportation of users between pickup and drop-off locations associated with specified time windows. This paper proposes a novel algorithm called multi-atomic annealing (MATA) to solve static dial-a-ride problem. Two new local search operators (burn and reform), a new construction heuristic and two request sequencing mechanisms (Sorted List and Random List) are developed. Computational experiments conducted on various standard DARP test instances prove that MATA is an expeditious meta-heuristic in contrast to other existing methods. In all experiments, MATA demonstrates the capability to obtain high quality solutions, faster convergence, and quicker attainment of a first feasible solution. It is observed that MATA attains a first feasible solution 29.8 to 65.1% faster, and obtains a final solution that is 3.9 to 5.2% better, when compared to other algorithms within 60 sec.

AIJan 25, 2018
An Improved Tabu Search Heuristic for Static Dial-A-Ride Problem

Songguang Ho, Sarat Chandra Nagavarapu, Ramesh Ramasamy Pandi et al.

Multi-vehicle routing has become increasingly important with the rapid development of autonomous vehicle technology. Dial-a-ride problem, a variant of vehicle routing problem (VRP), deals with the allocation of customer requests to vehicles, scheduling the pick-up and drop-off times and the sequence of serving those requests by ensuring high customer satisfaction with minimized travel cost. In this paper, we propose an improved tabu search (ITS) heuristic for static dial-a-ride problem (DARP) with the objective of obtaining high-quality solutions in short time. Two new techniques, initialization heuristic, and time window adjustment are proposed to achieve faster convergence to the global optimum. Various numerical experiments are conducted for the proposed solution methodology using DARP test instances from the literature and the convergence speed up is validated.

MLNov 16, 2017
Neurology-as-a-Service for the Developing World

Tejas Dharamsi, Payel Das, Tejaswini Pedapati et al.

Electroencephalography (EEG) is an extensively-used and well-studied technique in the field of medical diagnostics and treatment for brain disorders, including epilepsy, migraines, and tumors. The analysis and interpretation of EEGs require physicians to have specialized training, which is not common even among most doctors in the developed world, let alone the developing world where physician shortages plague society. This problem can be addressed by teleEEG that uses remote EEG analysis by experts or by local computer processing of EEGs. However, both of these options are prohibitively expensive and the second option requires abundant computing resources and infrastructure, which is another concern in developing countries where there are resource constraints on capital and computing infrastructure. In this work, we present a cloud-based deep neural network approach to provide decision support for non-specialist physicians in EEG analysis and interpretation. Named `neurology-as-a-service,' the approach requires almost no manual intervention in feature engineering and in the selection of an optimal architecture and hyperparameters of the neural network. In this study, we deploy a pipeline that includes moving EEG data to the cloud and getting optimal models for various classification tasks. Our initial prototype has been tested only in developed world environments to-date, but our intention is to test it in developing world environments in future work. We demonstrate the performance of our proposed approach using the BCI2000 EEG MMI dataset, on which our service attains 63.4% accuracy for the task of classifying real vs. imaginary activity performed by the subject, which is significantly higher than what is obtained with a shallow approach such as support vector machines.