SIMay 26
Rewarding Engagement and Personalization in Popularity-Based Rankings Amplifies Extremism and PolarizationJacopo D'Ignazi, Emma Fraxanet Morales, Andreas Kaltenbrunner et al.
Despite extensive research, the mechanisms through which online platforms shape extremism and polarization remain poorly understood. We identify and test a mechanism, grounded in empirical evidence, that explains how ranking algorithms can amplify both phenomena. This mechanism is based on well-documented assumptions: (i) users exhibit position bias and tend to prefer items displayed higher in the ranking, (ii) users prefer like-minded content, (iii) users with more extreme views are more likely to engage actively, and (iv) ranking algorithms are popularity-based, assigning higher positions to items that attract more clicks. Under these conditions, when platforms additionally reward \emph{active} engagement and implement \emph{personalized} rankings, users are inevitably driven toward more extremist and polarized news consumption. We formalize this mechanism in a dynamical model, which we evaluate by means of simulations and interactive experiments with hundreds of human participants, where the rankings are updated dynamically in response to user activity.
SIDec 27, 2016
Action selection in growing state spaces: Control of Network Structure GrowthDominik Thalmeier, Vicenç Gómez, Hilbert J. Kappen
The dynamical processes taking place on a network depend on its topology. Influencing the growth process of a network therefore has important implications on such dynamical processes. We formulate the problem of influencing the growth of a network as a stochastic optimal control problem in which a structural cost function penalizes undesired topologies. We approximate this control problem with a restricted class of control problems that can be solved using probabilistic inference methods. To deal with the increasing problem dimensionality, we introduce an adaptive importance sampling method for approximating the optimal control. We illustrate this methodology in the context of formation of information cascades, considering the task of influencing the structure of a growing conversation thread, as in Internet forums. Using a realistic model of growing trees, we show that our approach can yield conversation threads with better structural properties than the ones observed without control.
LGNov 27, 2022
Beyond 1-WL with Local Ego-Network EncodingsNurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicenç Gómez
Identifying similar network structures is key to capture graph isomorphisms and learn representations that exploit structural information encoded in graph data. This work shows that ego-networks can produce a structural encoding scheme for arbitrary graphs with greater expressivity than the Weisfeiler-Lehman (1-WL) test. We introduce IGEL, a preprocessing step to produce features that augment node representations by encoding ego-networks into sparse vectors that enrich Message Passing (MP) Graph Neural Networks (GNNs) beyond 1-WL expressivity. We describe formally the relation between IGEL and 1-WL, and characterize its expressive power and limitations. Experiments show that IGEL matches the empirical expressivity of state-of-the-art methods on isomorphism detection while improving performance on seven GNN architectures.
LGJul 9, 2024
Hierarchical Average-Reward Linearly-solvable Markov Decision ProcessesGuillermo Infante, Anders Jonsson, Vicenç Gómez
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
LGMar 22, 2024
Planning with a Learned Policy Basis to Optimally Solve Complex TasksGuillermo Infante, David Kuric, Anders Jonsson et al.
Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.
AISep 16, 2025
From Next Token Prediction to (STRIPS) World Models -- Preliminary ResultsCarlos Núñez-Molina, Vicenç Gómez, Hector Geffner
We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.
LGDec 10, 2023
Improving Subgraph-GNNs via Edge-Level Ego-Network EncodingsNurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicenç Gómez
We present a novel edge-level ego-network encoding for learning on graphs that can boost Message Passing Graph Neural Networks (MP-GNNs) by providing additional node and edge features or extending message-passing formats. The proposed encoding is sufficient to distinguish Strongly Regular Graphs, a family of challenging 3-WL equivalent graphs. We show theoretically that such encoding is more expressive than node-based sub-graph MP-GNNs. In an empirical evaluation on four benchmarks with 10 graph datasets, our results match or improve previous baselines on expressivity, graph classification, graph regression, and proximity tasks -- while reducing memory usage by 18.1x in certain real-world settings.
CLSep 4, 2021
Uncovering the Limits of Text-based Emotion DetectionNurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicenç Gómez
Identifying emotions from text is crucial for a variety of real world tasks. We consider the two largest now-available corpora for emotion classification: GoEmotions, with 58k messages labelled by readers, and Vent, with 33M writer-labelled messages. We design a benchmark and evaluate several feature spaces and learning algorithms, including two simple yet novel models on top of BERT that outperform previous strong baselines on GoEmotions. Through an experiment with human participants, we also analyze the differences between how writers express emotions and how readers perceive them. Our results suggest that emotions expressed by writers are harder to identify than emotions that readers perceive. We share a public web interface for researchers to explore our models.
LGJun 29, 2021
Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision ProcessesGuillermo Infante, Anders Jonsson, Vicenç Gómez
In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
AIJan 15, 2021
Hierarchical Width-Based Planning and LearningMiquel Junyent, Vicenç Gómez, Anders Jonsson
Width-based search methods have demonstrated state-of-the-art performance in a wide range of testbeds, from classical planning problems to image-based simulators such as Atari games. These methods scale independently of the size of the state-space, but exponentially in the problem width. In practice, running the algorithm with a width larger than 1 is computationally intractable, prohibiting IW from solving higher width problems. In this paper, we present a hierarchical algorithm that plans at two levels of abstraction. A high-level planner uses abstract features that are incrementally discovered from low-level pruning decisions. We illustrate this algorithm in classical planning PDDL domains as well as in pixel-based simulator domains. In classical planning, we show how IW(1) at two levels of abstraction can solve problems of width 2. For pixel-based domains, we show how in combination with a learned policy and a learned value function, the proposed hierarchical IW can outperform current flat IW-based planners in Atari games with sparse rewards.
LGSep 26, 2020
Inductive Graph Embeddings through Locality EncodingsNurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicenç Gómez
Learning embeddings from large-scale networks is an open challenge. Despite the overwhelming number of existing methods, is is unclear how to exploit network structure in a way that generalizes easily to unseen nodes, edges or graphs. In this work, we look at the problem of finding inductive network embeddings in large networks without domain-dependent node/edge attributes. We propose to use a set of basic predefined local encodings as the basis of a learning algorithm. In particular, we consider the degree frequencies at different distances from a node, which can be computed efficiently for relatively short distances and a large number of nodes. Interestingly, the resulting embeddings generalize well across unseen or distant regions in the network, both in unsupervised settings, when combined with language model learning, as well as in supervised tasks, when used as additional features in a neural network. Despite its simplicity, this method achieves state-of-the-art performance in tasks such as role detection, link prediction and node classification, and represents an inductive network embedding method directly applicable to large unattributed networks.
SYMay 13, 2020
Adaptive Smoothing Path Integral ControlDominik Thalmeier, Hilbert J. Kappen, Simone Totaro et al.
In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.
LGSep 25, 2019
Input complexity and out-of-distribution detection with likelihood-based generative modelsJoan Serrà, David Álvarez, Vicenç Gómez et al.
Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
LGMay 13, 2019
Consequential Ranking Algorithms and Long-term WelfareBehzad Tabibian, Vicenç Gómez, Abir De et al.
Ranking models are typically designed to provide rankings that optimize some measure of immediate utility to the users. As a result, they have been unable to anticipate an increasing number of undesirable long-term consequences of their proposed rankings, from fueling the spread of misinformation and increasing polarization to degrading social discourse. Can we design ranking models that understand the consequences of their proposed rankings and, more importantly, are able to avoid the undesirable ones? In this paper, we first introduce a joint representation of rankings and user dynamics using Markov decision processes. Then, we show that this representation greatly simplifies the construction of consequential ranking models that trade off the immediate utility and the long-term welfare. In particular, we can obtain optimal consequential rankings just by applying weighted sampling on the rankings provided by models that maximize measures of immediate utility. However, in practice, such a strategy may be inefficient and impractical, specially in high dimensional scenarios. To overcome this, we introduce an efficient gradient-based algorithm to learn parameterized consequential ranking models that effectively approximate optimal ones. We showcase our methodology using synthetic and real data gathered from Reddit and show that ranking models derived using our methodology provide ranks that may mitigate the spread of misinformation and improve the civility of online discussions.
AIApr 12, 2019
Deep Policies for Width-Based Planning in Pixel DomainsMiquel Junyent, Anders Jonsson, Vicenç Gómez
Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using the "screen" states and selecting actions by looking ahead into the future. However, this algorithm is purely exploratory and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task, e.g., the B-PROST pixel features. In this work, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called $π$-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Surprisingly, we observe that the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare $π$-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that $π$-IW has superior performance. We also show that $π$-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite.
IRFeb 7, 2019
The few-get-richer: a surprising consequence of popularity-based rankingsFabrizio Germano, Vicenç Gómez, Gaël Le Mens
Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation.
SIJan 15, 2019
Sharing emotions at scale: The Vent datasetNikolaos Lykousas, Costantinos Patsakis, Andreas Kaltenbrunner et al.
The continuous and increasing use of social media has enabled the expression of human thoughts, opinions, and everyday actions publicly at an unprecedented scale. We present the Vent dataset, the largest annotated dataset of text, emotions, and social connections to date. It comprises more than 33 millions of posts by nearly a million of users together with their social connections. Each post has an associated emotion. There are 705 different emotions, organized in 63 "emotion categories", forming a two-level taxonomy of affects. Our initial statistical analysis describes the global patterns of activity in the Vent platform, revealing large heterogenities and certain remarkable regularities regarding the use of the different emotions. We focus on the aggregated use of emotions, the temporal activity, and the social network of users, and outline possible methods to infer emotion networks based on the user activity. We also analyze the text and describe the affective landscape of Vent, finding agreements with existing (small scale) annotated corpus in terms of emotion categories and positive/negative valences. Finally, we discuss possible research questions that can be addressed from this unique dataset.
AIJun 15, 2018
Improving width-based planning with compact policiesMiquel Junyent, Anders Jonsson, Vicenç Gómez
Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the state-of-the-art reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite.
LGMay 22, 2017
A unified view of entropy-regularized Markov decision processesGergely Neu, Anders Jonsson, Vicenç Gómez
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.
LGFeb 21, 2017
Fast rates for online learning in Linearly Solvable Markov Decision ProcessesGergely Neu, Vicenç Gómez
We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online setting where the state costs may change arbitrarily between consecutive rounds, and the learner only observes the costs at the end of each respective round. We are interested in constructing algorithms for the learner that guarantee small regret against the best stationary control policy chosen in full knowledge of the cost sequence. Our main result is showing that the smoothness of the control cost enables the simple algorithm of following the leader to achieve a regret of order $\log^2 T$ after $T$ rounds, vastly improving on the best known regret bound of order $T^{3/4}$ for this setting.
AIMar 10, 2016
Hierarchical Linearly-Solvable Markov Decision ProblemsAnders Jonsson, Vicenç Gómez
We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solving LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical hierarchical reinforcement learning domains: the taxi domain and an autonomous guided vehicle task.
SYFeb 16, 2015
Real-Time Stochastic Optimal Control for Multi-agent Quadrotor SystemsVicenç Gómez, Sep Thijssen, Andrew Symington et al.
This paper presents a novel method for controlling teams of unmanned aerial vehicles using Stochastic Optimal Control (SOC) theory. The approach consists of a centralized high-level planner that computes optimal state trajectories as velocity sequences, and a platform-specific low-level controller which ensures that these velocity sequences are met. The planning task is expressed as a centralized path-integral control problem, for which optimal control computation corresponds to a probabilistic inference problem that can be solved by efficient sampling methods. Through simulation we show that our SOC approach (a) has significant benefits compared to deterministic control and other SOC methods in multimodal problems with noise-dependent optimal solutions, (b) is capable of controlling a large number of platforms in real-time, and (c) yields collective emergent behaviour in the form of flight formations. Finally, we show that our approach works for real platforms, by controlling a team of three quadrotors in outdoor conditions.
SYJun 4, 2014
Latent Kullback Leibler Control for Continuous-State Systems using Probabilistic Graphical ModelsTakamitsu Matsubara, Vicenç Gómez, Hilbert J. Kappen
Kullback Leibler (KL) control problems allow for efficient computation of optimal control by solving a principal eigenvector problem. However, direct applicability of such framework to continuous state-action systems is limited. In this paper, we propose to embed a KL control problem in a probabilistic graphical model where observed variables correspond to the continuous (possibly high-dimensional) state of the system and latent variables correspond to a discrete (low-dimensional) representation of the state amenable for KL control computation. We present two examples of this approach. The first one uses standard hidden Markov models (HMMs) and computes exact optimal control, but is only applicable to low-dimensional systems. The second one uses factorial HMMs, it is scalable to higher dimensional problems, but control computation is approximate. We illustrate both examples in several robot motor control tasks.