LGMay 7
Measuring Learning Progress via Gradient-Momentum CouplingSamuel Blad, Martin Längkvist, Amy Loutfi
Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. By leveraging momentum's natural filtering of noise and oscillations, GMC identifies samples that contribute to ongoing parameter updates. Controlled experiments demonstrate noise robustness and emergent curriculum learning, with the signal prioritizing tasks by learning speed rather than difficulty. Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise.
LGNov 12, 2025
Efficient Hyperdimensional Computing with Modular Composite RepresentationsMarco Angioli, Christopher J. Kymn, Antonello Rosato et al.
The modular composite representation (MCR) is a computing model that represents information with high-dimensional integer vectors using modular arithmetic. Originally proposed as a generalization of the binary spatter code model, it aims to provide higher representational power while remaining a lighter alternative to models requiring high-precision components. Despite this potential, MCR has received limited attention. Systematic analyses of its trade-offs and comparisons with other models are lacking, sustaining the perception that its added complexity outweighs the improved expressivity. In this work, we revisit MCR by presenting its first extensive evaluation, demonstrating that it achieves a unique balance of capacity, accuracy, and hardware efficiency. Experiments measuring capacity demonstrate that MCR outperforms binary and integer vectors while approaching complex-valued representations at a fraction of their memory footprint. Evaluation on 123 datasets confirms consistent accuracy gains and shows that MCR can match the performance of binary spatter codes using up to 4x less memory. We investigate the hardware realization of MCR by showing that it maps naturally to digital logic and by designing the first dedicated accelerator. Evaluations on basic operations and 7 selected datasets demonstrate a speedup of up to 3 orders of magnitude and significant energy reductions compared to software implementation. When matched for accuracy against binary spatter codes, MCR achieves on average 3.08x faster execution and 2.68x lower energy consumption. These findings demonstrate that, although MCR requires more sophisticated operations than binary spatter codes, its modular arithmetic and higher per-component precision enable lower dimensionality. When realized with dedicated hardware, this results in a faster, more energy-efficient, and high-precision alternative to existing models.
AIMar 30
COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum GameAlkis Sygkounas, Rishi Hazra, Andreas Persson et al.
A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (LLMs) to generate both environments and agent policies, expressed as executable Python code. We model the interaction between environment and policy designers as a two-player zero-sum game, ensuring adversarial co-evolution in which environments expose policy weaknesses and policies adapt in response. This process induces an automated curriculum in which environments and policies co-evolve toward increasing complexity. To guarantee robustness and prevent forgetting as the curriculum progresses, we compute the mixed-strategy Nash equilibrium (MSNE) of the zero-sum game, thereby yielding a meta-policy. This MSNE meta-policy ensures that the agent does not forget to solve previously seen environments while learning to solve previously unseen ones. Experiments in urban driving, symbolic maze-solving, and geometric navigation showcase that COvolve produces progressively more complex environments. Our results demonstrate the potential of LLM-driven co-evolution to achieve open-ended learning without predefined task distributions or manual intervention.
LGMay 13
Contextual Bandits for Resource-Constrained Devices using Probabilistic LearningMarco Angioli, Kevin Johansson, Antonello Rosato et al.
Contextual bandits (CB) are online sequential decision-making problems under partial feedback that underpin many adaptive services. There is a growing demand to deploy CB agents directly on-device, under strict constraints on memory, compute, and energy. However, standard linear CB algorithms are often impractical for resource-constrained devices with their unfavorable scaling in computational and memory costs. Recently, HD-CB, a CB approach based on hyperdimensional computing principles, has been proposed to model and solve CB problems by moving into high-dimensional spaces. HD-CB offers faster convergence, favorable scalability, and improves memory efficiency compared to linear CB algorithms. However, its learning rule is accumulation-based: the values of action vectors grow over time, requiring high precision. While periodic binarization can prevent overflow in low-precision components, it may discard important information about magnitudes and degrade decision quality. This paper introduces probabilistic HD-CB, a low-precision variant that replaces deterministic accumulation with a probabilistic update rule. At each step, only a random subset of vector components is updated, with a time-decaying update probability, and component values are constrained to a predefined range [-k,+k]. This approach enables low-precision components, prevents overflow without periodic binarization, and reduces the expected update cost in proportion to the fraction of updated components. Off-policy evaluation on standardized synthetic CB benchmarks using the Open Bandit Pipeline shows that probabilistic HD-CB consistently outperforms binarized HD-CB at equal precision, while approaching the performance of HD-CB with as few as 3 bits per component.
LGMar 30
Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language ModelsAlkis Sygkounas, Amy Loutfi, Andreas Persson
Reinforcement learning algorithms are defined by their learning update rules, which are typically hand-designed and fixed. We present an evolutionary framework for discovering reinforcement learning algorithms by searching directly over executable update rules that implement complete training procedures. The approach builds on REvolve, an evolutionary system that uses large language models as generative variation operators, and extends it from reward-function discovery to algorithm discovery. To promote the emergence of nonstandard learning rules, the search excludes canonical mechanisms such as actor--critic structures, temporal-difference losses, and value bootstrapping. Because reinforcement learning algorithms are highly sensitive to internal scalar parameters, we introduce a post-evolution refinement stage in which a large language model proposes feasible hyperparameter ranges for each evolved update rule. Evaluated end-to-end by full training runs on multiple Gymnasium benchmarks, the discovered algorithms achieve competitive performance relative to established baselines, including SAC, PPO, DQN, and A2C.
CVMar 10
LAP: A Language-Aware Planning Model For Procedure Planning In Instructional VideosLei Shi, Victor Aregbede, Andreas Persson et al.
Procedure planning requires a model to predict a sequence of actions that transform a start visual observation into a goal in instructional videos. While most existing methods rely primarily on visual observations as input, they often struggle with the inherent ambiguity where different actions can appear visually similar. In this work, we argue that language descriptions offer a more distinctive representation in the latent space for procedure planning. We introduce Language-Aware Planning (LAP), a novel method that leverages the expressiveness of language to bridge visual observation and planning. LAP uses a finetuned Vision Language Model (VLM) to translate visual observations into text descriptions and to predict actions and extract text embeddings. These text embeddings are more distinctive than visual embeddings and are used in a diffusion model for planning action sequences. We evaluate LAP on three procedure planning benchmarks: CrossTask, Coin, and NIV. LAP achieves new state-of-the-art performance across multiple metrics and time horizons by large margin, demonstrating the significant advantage of language-aware planning.
ROFeb 2, 2021Code
An Open-Source Modular Robotic System for Telepresence and Remote DisinfectionAndre Potenza, Andrey Kiselev, Alessandro Saffiotti et al.
In a pandemic contact between humans needs to be avoided wherever possible. Robots can take over an increasing number of tasks to protect people from being exposed to others. One such task is the disinfection of environments in which infection spread is particularly likely or bears increased risks. It has been shown that UVC light is effective in neutralizing a variety of pathogens, among others the virus causing COVID-19, SARS-CoV-2. Another function which can reduce the need for physical proximity between humans is interaction via telepresence, i.e., the remote embodiment of a person controlling the robot. This work presents a modular mobile robot for telepresence and disinfection with UVC lamps. Both operation modes are supported by adaptable autonomy navigation features for facilitating efficient task execution. The platform's primary contributions are its hardware and software design, which combine consumer-grade components and 3D-printed mounting with open-source software frameworks.
LGJun 25, 2025
Affective Priming Score: A Data-Driven Method to Detect Priming in Sequential DatasetsEduardo Gutierrez Maestro, Hadi Banaee, Amy Loutfi
Affective priming exemplifies the challenge of ambiguity in affective computing. While the community has largely addressed this issue from a label-based perspective, identifying data points in the sequence affected by the priming effect, the impact of priming on data itself, particularly in physiological signals, remains underexplored. Data affected by priming can lead to misclassifications when used in learning models. This study proposes the Affective Priming Score (APS), a data-driven method to detect data points influenced by the priming effect. The APS assigns a score to each data point, quantifying the extent to which it is affected by priming. To validate this method, we apply it to the SEED and SEED-VII datasets, which contain sufficient transitions between emotional events to exhibit priming effects. We train models with the same configuration using both the original data and priming-free sequences. The misclassification rate is significantly reduced when using priming-free sequences compared to the original data. This work contributes to the broader challenge of ambiguity by identifying and mitigating priming effects at the data level, enhancing model robustness, and offering valuable insights for the design and collection of affective computing datasets.
LGApr 28, 2025
Interactive Double Deep Q-network: Integrating Human Interventions and Evaluative Predictions in Reinforcement Learning of Autonomous DrivingAlkis Sygkounas, Ioannis Athanasiadis, Andreas Persson et al.
Integrating human expertise with machine learning is crucial for applications demanding high accuracy and safety, such as autonomous driving. This study introduces Interactive Double Deep Q-network (iDDQN), a Human-in-the-Loop (HITL) approach that enhances Reinforcement Learning (RL) by merging human insights directly into the RL training process, improving model performance. Our proposed iDDQN method modifies the Q-value update equation to integrate human and agent actions, establishing a collaborative approach for policy development. Additionally, we present an offline evaluative framework that simulates the agent's trajectory as if no human intervention had occurred, to assess the effectiveness of human interventions. Empirical results in simulated autonomous driving scenarios demonstrate that iDDQN outperforms established approaches, including Behavioral Cloning (BC), HG-DAgger, Deep Q-Learning from Demonstrations (DQfD), and vanilla DRL in leveraging human expertise for improving performance and adaptability.
NEJun 3, 2024
REvolve: Reward Evolution with Large Language Models using Human FeedbackRishi Hazra, Alkis Sygkounas, Andreas Persson et al.
Designing effective reward functions is crucial to training reinforcement learning (RL) algorithms. However, this design is non-trivial, even for domain experts, due to the subjective nature of certain tasks that are hard to quantify explicitly. In recent works, large language models (LLMs) have been used for reward generation from natural language task descriptions, leveraging their extensive instruction tuning and commonsense understanding of human behavior. In this work, we hypothesize that LLMs, guided by human feedback, can be used to formulate reward functions that reflect human implicit knowledge. We study this in three challenging settings -- autonomous driving, humanoid locomotion, and dexterous manipulation -- wherein notions of ``good" behavior are tacit and hard to quantify. To this end, we introduce REvolve, a truly evolutionary framework that uses LLMs for reward design in RL. REvolve generates and refines reward functions by utilizing human feedback to guide the evolution process, effectively translating implicit human knowledge into explicit reward functions for training (deep) RL agents. Experimentally, we demonstrate that agents trained on REvolve-designed rewards outperform other state-of-the-art baselines.
ROJul 21, 2021
Levels of Automation for a Mobile Robot Teleoperated by a CaregiverSamuel Olatunji, Andre Potenza, Andrey Kiselev et al.
Caregivers in eldercare can benefit from telepresence robots that allow them to perform a variety of tasks remotely. In order for such robots to be operated effectively and efficiently by non-technical users, it is important to examine if and how the robotic system's level of automation (LOA) impacts their performance. The objective of this work was to develop suitable LOA modes for a mobile robotic telepresence (MRP) system for eldercare and assess their influence on users' performance, workload, awareness of the environment and usability at two different levels of task complexity. For this purpose, two LOA modes were implemented on the MRP platform: assisted teleoperation (low LOA mode) and autonomous navigation (high LOA mode). The system was evaluated in a user study with 20 participants, who, in the role of the caregiver, navigated the robot through a home-like environment to perform control and perception tasks. Results revealed that performance improved in the high LOA when task complexity was low. However, when task complexity increased, lower LOA improved performance. This opposite trend was also observed in the results for workload and situation awareness. We discuss the results in terms of the LOAs' impact on users' attitude towards automation and implications on usability.
HCJun 10, 2021
Do you feel safe with your robot? Factors Influencing Perceived Safety in Human-Robot Interaction based on Subjective and Objective MeasuresNeziha Akalin, Annica Kristoffersson, Amy Loutfi
Safety in human-robot interaction can be divided into physical safety and perceived safety, where the latter is still under-addressed in the literature. Investigating perceived safety in human-robot interaction requires a multidisciplinary perspective. Indeed, perceived safety is often considered as being associated with several common factors studied in other disciplines, i.e., comfort, predictability, sense of control, and trust. In this paper, we investigated the relationship between these factors and perceived safety in human-robot interaction using subjective and objective measures. We conducted a two-by-five mixed-subjects design experiment. The five within-subjects conditions correspond to (1) baseline, and the manipulations of robot behaviors to stimulate: (2) discomfort, (3) decreased perceived safety, (4) decreased sense of control and (5) distrust. Twenty-seven young adult participants took part in the experiments. Participants were asked to answer questionnaires that measure the manipulated factors after within-subjects conditions. Besides questionnaire data, we collected objective measures such as videos and physiological data. The questionnaire results show a correlation between comfort, sense of control, trust, and perceived safety. We also discuss the effect of individual human characteristics (such as personality and gender) that they could be predictors of perceived safety. We used the physiological signal data and facial affect from videos for estimating perceived safety where participants' subjective ratings were utilized as labels. The data from objective measures revealed that the prediction rate was higher from physiological signal data. This paper can play an important role in the goal of better understanding perceived safety in human-robot interaction.
ROSep 21, 2020
Reinforcement Learning Approaches in Social RoboticsNeziha Akalin, Amy Loutfi
This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.
AIMar 13, 2020
Online Guest Detection in a Smart Home using Pervasive Sensors and Probabilistic ReasoningJennifer Renoux, Uwe Köckemann, Amy Loutfi
Smart home environments equipped with distributed sensor networks are capable of helping people by providing services related to health, emergency detection or daily routine management. A backbone to these systems relies often on the system's ability to track and detect activities performed by the users in their home. Despite the continuous progress in the area of activity recognition in smart homes, many systems make a strong underlying assumption that the number of occupants in the home at any given moment of time is always known. Estimating the number of persons in a Smart Home at each time step remains a challenge nowadays. Indeed, unlike most (crowd) counting solution which are based on computer vision techniques, the sensors considered in a Smart Home are often very simple and do not offer individually a good overview of the situation. The data gathered needs therefore to be fused in order to infer useful information. This paper aims at addressing this challenge and presents a probabilistic approach able to estimate the number of persons in the environment at each time step. This approach works in two steps: first, an estimate of the number of persons present in the environment is done using a Constraint Satisfaction Problem solver, based on the topology of the sensor network and the sensor activation pattern at this time point. Then, a Hidden Markov Model refines this estimate by considering the uncertainty related to the sensors. Using both simulated and real data, our method has been tested and validated on two smart homes of different sizes and configuration and demonstrates the ability to accurately estimate the number of inhabitants.
AIFeb 24, 2020
Symbolic Learning and Reasoning with Noisy Data for Probabilistic AnchoringPedro Zuidberg Dos Martires, Nitesh Kumar, Andreas Persson et al.
Robotic agents should be able to learn from sub-symbolic sensor data, and at the same time, be able to reason about objects and communicate with humans on a symbolic level. This raises the question of how to overcome the gap between symbolic and sub-symbolic artificial intelligence. We propose a semantic world modeling approach based on bottom-up object anchoring using an object-centered representation of the world. Perceptual anchoring processes continuous perceptual sensor data and maintains a correspondence to a symbolic representation. We extend the definitions of anchoring to handle multi-modal probability distributions and we couple the resulting symbol anchoring system to a probabilistic logic reasoner for performing inference. Furthermore, we use statistical relational learning to enable the anchoring framework to learn symbolic knowledge in the form of a set of probabilistic logic rules of the world from noisy and sub-symbolic sensor input. The resulting framework, which combines perceptual anchoring and statistical relational learning, is able to maintain a semantic world model of all the objects that have been perceived over time, while still exploiting the expressiveness of logical rules to reason about the state of objects which are not directly observed through sensory input data. To validate our approach we demonstrate, on the one hand, the ability of our system to perform probabilistic reasoning over multi-modal probability distributions, and on the other hand, the learning of probabilistic logical rules from anchored objects produced by perceptual observations. The learned logical rules are, subsequently, used to assess our proposed probabilistic anchoring procedure. We demonstrate our system in a setting involving object interactions where object occlusions arise and where probabilistic inference is needed to correctly anchor objects.
AIApr 30, 2019
Learning from Implicit Information in Natural Language Instructions for Robotic ManipulationsOzan Arkan Can, Pedro Zuidberg Dos Martires, Andreas Persson et al.
Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot's world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world.
AIApr 30, 2019
Semantic Referee: A Neural-Symbolic Framework for Enhancing Geospatial Semantic SegmentationMarjan Alirezaie, Martin Längkvist, Michael Sioutis et al.
Understanding why machine learning algorithms may fail is usually the task of the human expert that uses domain knowledge and contextual information to discover systematic shortcomings in either the data or the algorithm. In this paper, we propose a semantic referee, which is able to extract qualitative features of the errors emerging from deep machine learning frameworks and suggest corrections. The semantic referee relies on ontological reasoning about spatial knowledge in order to characterize errors in terms of their spatial relations with the environment. Using semantics, the reasoner interacts with the learning algorithm as a supervisor. In this paper, the proposed method of the interaction between a neural network classifier and a semantic referee shows how to improve the performance of semantic segmentation for satellite imagery data.
HCApr 26, 2019
Interactive user interface based on Convolutional Auto-encoders for annotating CT-scansMartin Längkvist, Jonas Widell, Per Thunberg et al.
High resolution computed tomography (HRCT) is the most important imaging modality for interstitial lung diseases, where the radiologists are interested in identifying certain patterns, and their volumetric and regional distribution. The use of machine learning can assist the radiologists with both these tasks by performing semantic segmentation. In this paper, we propose an interactive annotation-tool for semantic segmentation that assists the radiologist in labeling CT scans. The annotation tool is evaluated by six radiologists and radiology residents classifying healthy lung and reticular pattern i HRCT images. The usability of the system is evaluated with a System Usability Score (SUS) and interaction information from the readers that used the tool for annotating the CT volumes. It was discovered that the experienced usability and how the users interactied with the system differed between the users. A higher SUS-score was given by users that prioritized learning speed over model accuracy and spent less time with manual labeling and instead utilized the suggestions provided by the GUI. An analysis of the annotation variations between the readers show substantial agreement (Cohen's kappa=0.69) for classification of healthy and affected lung parenchyma in pulmonary fibrosis. The inter-reader variation is a challenge for the definition of ground truth.
ROFeb 26, 2019
Semantic Relational Object TrackingAndreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi et al.
This paper addresses the topic of semantic world modeling by conjoining probabilistic reasoning and object anchoring. The proposed approach uses a so-called bottom-up object anchoring method that relies on the rich continuous data from perceptual sensor data. A novel anchoring matching function method learns to maintain object entities in space and time and is validated using a large set of trained humanly annotated ground truth data of real-world objects. For more complex scenarios, a high-level probabilistic object tracker has been integrated with the anchoring framework and handles the tracking of occluded objects via reasoning about the state of unobserved objects. We demonstrate the performance of our integrated approach through scenarios such as the shell game scenario, where we illustrate how anchored objects are retained by preserving relations through probabilistic reasoning.
LGMay 14, 2018
A Deep Learning Approach with an Attention Mechanism for Automatic Sleep Stage ClassificationMartin Längkvist, Amy Loutfi
Automatic sleep staging is a challenging problem and state-of-the-art algorithms have not yet reached satisfactory performance to be used instead of manual scoring by a sleep technician. Much research has been done to find good feature representations that extract the useful information to correctly classify each epoch into the correct sleep stage. While many useful features have been discovered, the amount of features have grown to an extent that a feature reduction step is necessary in order to avoid the curse of dimensionality. One reason for the need of such a large feature set is that many features are good for discriminating only one of the sleep stages and are less informative during other stages. This paper explores how a second feature representation over a large set of pre-defined features can be learned using an auto-encoder with a selective attention for the current sleep stage in the training batch. This selective attention allows the model to learn feature representations that focuses on the more relevant inputs without having to perform any dimensionality reduction of the input data. The performance of the proposed algorithm is evaluated on a large data set of polysomnography (PSG) night recordings of patients with sleep-disordered breathing. The performance of the auto-encoder with selective attention is compared with a regular auto-encoder and previous works using a deep belief network (DBN).
AIDec 26, 2014
Reasoning for Improved Sensor Data Interpretation in a Smart HomeMarjan Alirezaie, Amy Loutfi
In this paper an ontological representation and reasoning paradigm has been proposed for interpretation of time-series signals. The signals come from sensors observing a smart environment. The signal chosen for the annotation process is a set of unintuitive and complex gas sensor data. The ontology of this paradigm is inspired form the SSN ontology (Semantic Sensor Network) and used for representation of both the sensor data and the contextual information. The interpretation process is mainly done by an incremental ASP solver which as input receives a logic program that is generated from the contents of the ontology. The contextual information together with high level domain knowledge given in the ontology are used to infer explanations (answer sets) for changes in the ambient air detected by the gas sensors.