HCMay 29, 2021Code
Tournesol: A quest for a large, secure and trustworthy database of reliable human judgmentsLê-Nguyên Hoang, Louis Faucon, Aidan Jungo et al.
Today's large-scale algorithms have become immensely influential, as they recommend and moderate the content that billions of humans are exposed to on a daily basis. They are the de-facto regulators of our societies' information diet, from shaping opinions on public health to organizing groups for social movements. This creates serious concerns, but also great opportunities to promote quality information. Addressing the concerns and seizing the opportunities is a challenging, enormous and fabulous endeavor, as intuitively appealing ideas often come with unwanted {\it side effects}, and as it requires us to think about what we deeply prefer. Understanding how today's large-scale algorithms are built is critical to determine what interventions will be most effective. Given that these algorithms rely heavily on {\it machine learning}, we make the following key observation: \emph{any algorithm trained on uncontrolled data must not be trusted}. Indeed, a malicious entity could take control over the data, poison it with dangerously manipulative fabricated inputs, and thereby make the trained algorithm extremely unsafe. We thus argue that the first step towards safe and ethical large-scale algorithms must be the collection of a large, secure and trustworthy dataset of reliable human judgments. To achieve this, we introduce \emph{Tournesol}, an open source platform available at \url{https://tournesol.app}. Tournesol aims to collect a large database of human judgments on what algorithms ought to widely recommend (and what they ought to stop widely recommending). We outline the structure of the Tournesol database, the key features of the Tournesol platform and the main hurdles that must be overcome to make it a successful project. Most importantly, we argue that, if successful, Tournesol may then serve as the essential foundation for any safe and ethical large-scale algorithm.
IRJul 17, 2021
FEBR: Expert-Based Recommendation Framework for beneficial and personalized contentMohamed Lechiakh, Alexandre Maurer
So far, most research on recommender systems focused on maintaining long-term user engagement and satisfaction, by promoting relevant and personalized content. However, it is still very challenging to evaluate the quality and the reliability of this content. In this paper, we propose FEBR (Expert-Based Recommendation Framework), an apprenticeship learning framework to assess the quality of the recommended content on online platforms. The framework exploits the demonstrated trajectories of an expert (assumed to be reliable) in a recommendation evaluation environment, to recover an unknown utility function. This function is used to learn an optimal policy describing the expert's behavior, which is then used in the framework to provide high-quality and personalized recommendations. We evaluate the performance of our solution through a user interest simulation environment (using RecSim). We simulate interactions under the aforementioned expert policy for videos recommendation, and compare its efficiency with standard recommendation methods. The results show that our approach provides a significant gain in terms of content quality, evaluated by experts and watched by users, while maintaining almost the same watch time as the baseline approaches.
AIJun 7, 2018
Removing Algorithmic Discrimination (With Minimal Individual Error)El Mahdi El Mhamdi, Rachid Guerraoui, Lê Nguyên Hoang et al.
We address the problem of correcting group discriminations within a score function, while minimizing the individual error. Each group is described by a probability density function on the set of profiles. We first solve the problem analytically in the case of two populations, with a uniform bonus-malus on the zones where each population is a majority. We then address the general case of n populations, where the entanglement of populations does not allow a similar analytical solution. We show that an approximate solution with an arbitrarily high level of precision can be computed with linear programming. Finally, we address the inverse problem where the error should not go beyond a certain value and we seek to minimize the discrimination.
LGMay 29, 2018
Virtuously Safe Reinforcement LearningHenrik Aslund, El Mahdi El Mhamdi, Rachid Guerraoui et al.
We show that when a third party, the adversary, steps into the two-party setting (agent and operator) of safely interruptible reinforcement learning, a trade-off has to be made between the probability of following the optimal policy in the limit, and the probability of escaping a dangerous situation created by the adversary. So far, the work on safely interruptible agents has assumed a perfect perception of the agent about its environment (no adversary), and therefore implicitly set the second probability to zero, by explicitly seeking a value of one for the first probability. We show that (1) agents can be made both interruptible and adversary-resilient, and (2) the interruptibility can be made safe in the sense that the agent itself will not seek to avoid it. We also solve the problem that arises when the agent does not go completely greedy, i.e. issues with safe exploration in the limit. Resilience to perturbed perception, safe exploration in the limit, and safe interruptibility are the three pillars of what we call \emph{virtuously safe reinforcement learning}.
PEFeb 21, 2018
Learning to Gather without CommunicationEl Mahdi El Mhamdi, Rachid Guerraoui, Alexandre Maurer et al.
A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.
AIApr 10, 2017
Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement LearningEl Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx et al.
In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to \textit{interrupt} an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong defined \emph{safe interruptibility} for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces \textit{dynamic safe interruptibility}, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: \textit{joint action learners} and \textit{independent learners}. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.
DSJan 14, 2013
On Byzantine Broadcast in Planar GraphsAlexandre Maurer, Sébastien Tixeuil
We consider the problem of reliably broadcasting information in a multihop asynchronous network in the presence of Byzantine failures: some nodes may exhibit unpredictable malicious behavior. We focus on completely decentralized solutions. Few Byzantine-robust algorithms exist for loosely connected networks. A recent solution guarantees reliable broadcast on a torus when D > 4, D being the minimal distance between two Byzantine nodes. In this paper, we generalize this result to 4-connected planar graphs. We show that reliable broadcast can be guaranteed when D > Z, Z being the maximal number of edges per polygon. We also show that this bound on D is a lower bound for this class of graphs. Our solution has the same time complexity as a simple broadcast. This is also the first solution where the memory required increases linearly (instead of exponentially) with the size of transmitted information. Important disclaimer: these results have NOT yet been published in an international conference or journal. This is just a technical report presenting intermediary and incomplete results. A generalized version of these results may be under submission.
DCOct 17, 2012
A Scalable Byzantine GridAlexandre Maurer, Sébastien Tixeuil
Modern networks assemble an ever growing number of nodes. However, it remains difficult to increase the number of channels per node, thus the maximal degree of the network may be bounded. This is typically the case in grid topology networks, where each node has at most four neighbors. In this paper, we address the following issue: if each node is likely to fail in an unpredictable manner, how can we preserve some global reliability guarantees when the number of nodes keeps increasing unboundedly ? To be more specific, we consider the problem or reliably broadcasting information on an asynchronous grid in the presence of Byzantine failures -- that is, some nodes may have an arbitrary and potentially malicious behavior. Our requirement is that a constant fraction of correct nodes remain able to achieve reliable communication. Existing solutions can only tolerate a fixed number of Byzantine failures if they adopt a worst-case placement scheme. Besides, if we assume a constant Byzantine ratio (each node has the same probability to be Byzantine), the probability to have a fatal placement approaches 1 when the number of nodes increases, and reliability guarantees collapse. In this paper, we propose the first broadcast protocol that overcomes these difficulties. First, the number of Byzantine failures that can be tolerated (if they adopt the worst-case placement) now increases with the number of nodes. Second, we are able to tolerate a constant Byzantine ratio, however large the grid may be. In other words, the grid becomes scalable. This result has important security applications in ultra-large networks, where each node has a given probability to misbehave.
DCSep 5, 2012
On Byzantine Broadcast in Loosely Connected NetworksAlexandre Maurer, Sébastien Tixeuil
We consider the problem of reliably broadcasting information in a multihop asynchronous network that is subject to Byzantine failures. Most existing approaches give conditions for perfect reliable broadcast (all correct nodes deliver the authentic message and nothing else), but they require a highly connected network. An approach giving only probabilistic guarantees (correct nodes deliver the authentic message with high probability) was recently proposed for loosely connected networks, such as grids and tori. Yet, the proposed solution requires a specific initialization (that includes global knowledge) of each node, which may be difficult or impossible to guarantee in self-organizing networks - for instance, a wireless sensor network, especially if they are prone to Byzantine failures. In this paper, we propose a new protocol offering guarantees for loosely connected networks that does not require such global knowledge dependent initialization. In more details, we give a methodology to determine whether a set of nodes will always deliver the authentic message, in any execution. Then, we give conditions for perfect reliable broadcast in a torus network. Finally, we provide experimental evaluation for our solution, and determine the number of randomly distributed Byzantine failures than can be tolerated, for a given correct broadcast probability.