CV MMDec 4, 2023

SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

Jiandong Jin, Xiao Wang, Yin Lin, Chenglong Li, Lili Huang, Aihua Zheng, Jin Tang

arXiv:2312.01640v212.118 citationsh-index: 43Has CodePattern Recognition

Originality Incremental advance

AI Analysis

This work improves pedestrian attribute recognition for surveillance and security applications, but it is incremental as it adapts existing generative models to a specific domain.

The paper tackles pedestrian attribute recognition by proposing SequencePAR, a sequence generation paradigm that addresses imbalanced data and noisy samples, achieving 84.92% accuracy, 90.44% precision, 90.73% recall, and 90.46% F1-score on the PETA dataset.

Current pedestrian attribute recognition (PAR) algorithms use multi-label or multi-task learning frameworks with specific classification heads. These models often struggle with imbalanced data and noisy samples. Inspired by the success of generative models, we propose Sequence Pedestrian Attribute Recognition (SequencePAR), a novel sequence generation paradigm for PAR. SequencePAR extracts pedestrian features using a language-image pre-trained model and embeds the attribute set into query tokens guided by text prompts. A Transformer decoder generates human attributes by integrating visual features and attribute query tokens. The masked multi-head attention layer in the decoder prevents the model from predicting the next attribute during training. The extensive experiments on multiple PAR datasets validate the effectiveness of SequencePAR. Specifically, we achieve 84.92\%, 90.44\%, 90.73\%, and 90.46\% in accuracy, precision, recall, and F1-score on the PETA dataset. The source code and pre-trained models are available at https://github.com/Event-AHU/OpenPAR.

View on arXiv PDF Code

Similar