Non-Hierarchical Transformers for Pedestrian Segmentation
This work addresses pedestrian segmentation for autonomous systems, with potential benefits for individuals with disabilities, but it is incremental as it combines existing methods.
The paper tackled instance segmentation for autonomous systems to improve accessibility and inclusivity, achieving a mean Average Precision of 52.68% on the AVA dataset test set.
We propose a methodology to address the challenge of instance segmentation in autonomous systems, specifically targeting accessibility and inclusivity. Our approach utilizes a non-hierarchical Vision Transformer variant, EVA-02, combined with a Cascade Mask R-CNN mask head. Through fine-tuning on the AVA instance segmentation challenge dataset, we achieved a promising mean Average Precision (mAP) of 52.68\% on the test set. Our results demonstrate the efficacy of ViT-based architectures in enhancing vision capabilities and accommodating the unique needs of individuals with disabilities.