SegmATRon: Embodied Adaptive Semantic Segmentation for Indoor Environment
This work addresses incremental improvements in embodied AI for indoor scene understanding, potentially benefiting robotics and simulation applications.
The paper tackles the problem of improving semantic segmentation in indoor environments by adapting model weights during inference using a hybrid loss function, showing that additional images from agent actions can enhance segmentation quality.
This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional images using the agent's actions in an indoor environment can improve the quality of semantic segmentation. The code of the proposed approach and datasets are publicly available at https://github.com/wingrune/SegmATRon.