CVAug 4, 2023

Scene-aware Human Pose Generation using Transformer

arXiv:2308.02177v145 citationsh-index: 29
Originality Incremental advance
AI Analysis

This addresses scene understanding and robotics applications by generating human poses that interact appropriately with environments, though it builds incrementally on existing template-based approaches.

The paper tackles the problem of generating reasonable human poses in scenes using contextual affordance learning, achieving state-of-the-art results on the Sitcom dataset with a template-based transformer method.

Affordance learning considers the interaction opportunities for an actor in the scene and thus has wide application in scene understanding and intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using affordance as context to generate a reasonable human pose in a scene. Existing scene-aware human pose generation methods could be divided into two categories depending on whether using pose templates. Our proposed method belongs to the template-based category, which benefits from the representative pose templates. Moreover, inspired by recent transformer-based methods, we associate each query embedding with a pose template, and use the interaction between query embeddings and scene feature map to effectively predict the scale and offsets for each pose template. In addition, we employ knowledge distillation to facilitate the offset learning given the predicted scale. Comprehensive experiments on Sitcom dataset demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes