CVMar 16, 2023

InCrowdFormer: On-Ground Pedestrian World Model From Egocentric Views

Mai Nishimura, Shohei Nobuhara, Ko Nishino

arXiv:2303.09534v11.51 citationsh-index: 39

Originality Incremental advance

AI Analysis

This addresses the problem of crowd navigation and tracking for robotics or AR applications, but it appears incremental as it builds on existing Transformer architectures for a specific domain.

The paper tackles predicting pedestrian movements on the ground plane from egocentric views by introducing InCrowdFormer, a Transformer-based model that autoregressively predicts positions with uncertainty encoding, achieving accurate future coordination predictions on a novel real-world benchmark.

We introduce an on-ground Pedestrian World Model, a computational model that can predict how pedestrians move around an observer in the crowd on the ground plane, but from just the egocentric-views of the observer. Our model, InCrowdFormer, fully leverages the Transformer architecture by modeling pedestrian interaction and egocentric to top-down view transformation with attention, and autoregressively predicts on-ground positions of a variable number of people with an encoder-decoder architecture. We encode the uncertainties arising from unknown pedestrian heights with latent codes to predict the posterior distributions of pedestrian positions. We validate the effectiveness of InCrowdFormer on a novel prediction benchmark of real movements. The results show that InCrowdFormer accurately predicts the future coordination of pedestrians. To the best of our knowledge, InCrowdFormer is the first-of-its-kind pedestrian world model which we believe will benefit a wide range of egocentric-view applications including crowd navigation, tracking, and synthesis.

View on arXiv PDF

Similar