Florent Altché

LG
h-index117
17papers
13,316citations
Novelty56%
AI Score42

17 Papers

CLNov 8, 2022
Self-conditioned Embedding Diffusion for Text Generation

Robin Strudel, Corentin Tallec, Florent Altché et al. · mit

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.

LGJun 16, 2022
BYOL-Explore: Exploration by Bootstrapped Prediction

Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar et al.

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments. On this benchmark, we solve the majority of the tasks purely through augmenting the extrinsic reward with BYOL-Explore s intrinsic reward, whereas prior work could only get off the ground with human demonstrations. As further evidence of the generality of BYOL-Explore, we show that it achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.

MLNov 18, 2022
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Daniel Jarrett, Corentin Tallec, Florent Altché et al.

Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma's Revenge with sticky actions, while preserving performance in the non-sticky setting.

SYApr 4, 2017
High-Speed Trajectory Planning for Autonomous Vehicles Using a Simple Dynamic Model

Florent Altché, Philip Polack, Arnaud de la Fortelle

To improve safety and energy efficiency, autonomous vehicles are expected to drive smoothly in most situations, while maintaining their velocity below a predetermined speed limit. However, some scenarios such as low road adherence or inadequate speed limit may require vehicles to automatically adapt their velocity without external input, while nearing the limits of their dynamic capacities. Many of the existing trajectory planning approaches are incapable of making such adjustments, since they assume a feasible velocity reference is given. Moreover, near-limits trajectory planning often implies high-complexity dynamic vehicle models, making computations difficult. In this article, we use a simple dynamic model derived from numerical simulations to design a trajectory planner for high-speed driving of an autonomous vehicle based on model predictive control. Unlike existing techniques, our formulation includes the selection of a feasible velocity to track a predetermined path while avoiding obstacles. Simulation results on a highly precise vehicle model show that our approach can be used in real-time to provide feasible trajectories that can be tracked using a simple control architecture. Moreover, the use of our simplified model makes the planner more robust and yields better trajectories compared to kinematic models commonly used in trajectory planning.

SYJan 24, 2018
Partitioning of the Free Space-Time for On-Road Navigation of Autonomous Ground Vehicles

Florent Altché, Arnaud de La Fortelle

In this article, we consider the problem of trajectory planning and control for on-road driving of an autonomous ground vehicle (AGV) in presence of static or moving obstacles. We propose a systematic approach to partition the collision-free portion of the space-time into convex sub-regions that can be interpreted in terms of relative positions with respect to a set of fixed or mobile obstacles. We show that this partitioning allows decomposing the NP-hard problem of computing an optimal collision-free trajectory, as a path-finding problem in a well-designed graph followed by a simple (polynomial time) optimization phase for any quadratic convex cost function. Moreover, robustness criteria such as margin of error while executing the trajectory can easily be taken into account at the graph-exploration phase, thus reducing the number of paths to explore.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

CVMar 30, 2021
Broaden Your Views for Self-Supervised Video Learning

Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac et al.

Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervised learning framework for video. In BraVe, one of the views has access to a narrow temporal window of the video while the other view has a broad access to the video content. Our models learn to generalise from the narrow view to the general content of the video. Furthermore, BraVe processes the views with different backbones, enabling the use of alternative augmentations or modalities into the broad view such as optical flow, randomly convolved RGB frames, audio or their combinations. We demonstrate that BraVe achieves state-of-the-art results in self-supervised representation learning on standard video and audio classification benchmarks including UCF101, HMDB51, Kinetics, ESC-50 and AudioSet.

MLOct 20, 2020
BYOL works even without batch statistics

Pierre H. Richemond, Jean-Bastien Grill, Florent Altché et al.

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.

LGJul 24, 2020
Monte-Carlo Tree Search as Regularized Policy Optimization

Jean-Bastien Grill, Florent Altché, Yunhao Tang et al.

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

LGJun 13, 2020
Bootstrap your own latent: A new approach to self-supervised Learning

Jean-Bastien Grill, Florian Strub, Florent Altché et al.

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.

LGApr 30, 2020
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Daniel Guo, Bernardo Avila Pires, Bilal Piot et al.

Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting.

AIFeb 20, 2019
World Discovery Models

Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires et al.

As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural Differential Information Gain Optimisation, a self-supervised discovery model that aims at seeking new information to construct a global view of its world from partial and noisy observations. Our experiments on some controlled 2-D navigation tasks show that NDIGO outperforms state-of-the-art information-seeking methods in terms of the quality of the learned representation. The improvement in performance is particularly significant in the presence of white or structured noise where other information-seeking methods follow the noise instead of discovering their world.

LGOct 22, 2018
Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

Guillaume Devineau, Philip Polack, Florent Altché et al.

This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided.

ROJan 24, 2018
An LSTM Network for Highway Trajectory Prediction

Florent Altché, Arnaud de La Fortelle

In order to drive safely and efficiently on public roads, autonomous vehicles will have to understand the intentions of surrounding vehicles, and adapt their own behavior accordingly. If experienced human drivers are generally good at inferring other vehicles' motion up to a few seconds in the future, most current Advanced Driving Assistance Systems (ADAS) are unable to perform such medium-term forecasts, and are usually limited to high-likelihood situations such as emergency braking. In this article, we present a first step towards consistent trajectory prediction by introducing a long short-term memory (LSTM) neural network, which is capable of accurately predicting future longitudinal and lateral trajectories for vehicles on highway. Unlike previous work focusing on a low number of trajectories collected from a few drivers, our network was trained and validated on the NGSIM US-101 dataset, which contains a total of 800 hours of recorded trajectories in various traffic densities, representing more than 6000 individual drivers.

SYJun 23, 2017
A Simple Dynamic Model for Aggressive, Near-Limits Trajectory Planning

Florent Altché, Philip Polack, Arnaud de La Fortelle

In normal on-road situations, autonomous vehicles will be expected to have smooth trajectories with relatively little demand on the vehicle dynamics to ensure passenger comfort and driving safety. However, the occurrence of unexpected events may require vehicles to perform aggressive maneuvers, near the limits of their dynamic capacities. In order to ensure the occupant's safety in these situations, the ability to plan controllable but near-limits trajectories will be of very high importance. One of the main issues in planning aggressive maneuvers lies in the high complexity of the vehicle dynamics near the handling limits, which effectively makes state-of-the-art methods such as Model Predictive Control difficult to use. This article studies a highly precise model of the vehicle body to derive a simpler, constrained second-order integrator dynamic model which remains precise even near the handling limits of the vehicle. Preliminary simulation results indicate that our model provides better accuracy without increasing computation time compared to a more classical kinematic bicycle model. The proposed model can find applications for contingency planning, which may require aggressive maneuvers, or for trajectory planning at high speed, for instance in racing applications.

ROApr 29, 2016
A Distributed Model Predictive Control Framework for Road-Following Formation Control of Car-like Vehicles (Extended Version)

Xiangjun Qian, Florent Altché, Arnaud de La Fortelle et al.

This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we design a local MPC-based tracking controller for each individual vehicle to follow a reference trajectory while satisfying various constraints (kinematics and dynamics, collision avoidance, \textit{etc.}). The reference trajectory of a vehicle is computed from its leader's trajectory, based on a pre-defined formation tree. We use logic rules to organize the collision avoidance behaviors of member vehicles. Moreover, we propose a methodology to safely reconfigure the formation on-the-fly. The proposed framework has been validated using high-fidelity simulations.

ROMar 15, 2016
Time-optimal Coordination of Mobile Robots along Specified Paths

Florent Altché, Xiangjun Qian, Arnaud de La Fortelle

In this paper, we address the problem of time-optimal coordination of mobile robots under kinodynamic constraints along specified paths. We propose a novel approach based on time discretization that leads to a mixed-integer linear programming (MILP) formulation. This problem can be solved using general-purpose MILP solvers in a reasonable time, resulting in a resolution-optimal solution. Moreover, unlike previous work found in the literature, our formulation allows an exact linear modeling (up to the discretization resolution) of second-order dynamic constraints. Extensive simulations are performed to demonstrate the effectiveness of our approach.