46.5CVJun 4
ReSAGE-PAR: Representational Similarity Assessment for Generative Expansion in Pedestrian Attribute RecognitionPablo Ayuso-Albizu, Pablo Carballeira, Juan C. SanMiguel et al.
To address the limited diversity and data scarcity in Pedestrian Attribute Recognition (PAR), we explore image synthesis using diffusion models guided by attribute-based prompts. While this enables the controlled generation of pedestrian images, it faces two critical challenges: (i) the domain gap between high-quality pre-training data and low-resolution, non-standard surveillance crops, and (ii) the need for reliable attribute verification to prevent generative hallucinations. In this paper, we introduce a robust generate-score-autolabel pipeline called ReSAGE-PAR (REpresentational Similarity Assessment for Generative Expansion in PAR) that bridges this domain gap and enables scalable, high-fidelity dataset expansion. First, we adapt pre-trained diffusion models to native PAR resolutions using a tailored LoRA-based Image-to-Image approach. Second, we extract vision-language alignment scores between the generated images and their conditioning prompts, utilizing a comprehensive prompting strategy that includes label-consistent and inconsistent complements. Finally, we formulate a Bayesian classifier that converts these continuous scores into reliable binary pseudo-labels. Extensive evaluations demonstrate the effectiveness of ReSAGE-PAR in preserving spatial priors and verifying attributes. When integrated into PAR training, ReSAGE-PAR consistently yields significant improvements-achieving gains of up to 8.7% on standard backbones and pushing state-of-the-art frameworks to new performance levels. This proves its value as an architecture-agnostic solution for scalable PAR enhancement. The complete codebase for ReSAGE-PAR is publicly available at http://www-vpu.eps.uam.es/publications/ReSAGE-PAR.
CVSep 2, 2025
Enhancing Zero-Shot Pedestrian Attribute Recognition with Synthetic Data Generation: A Comparative Study with Image-To-Image Diffusion ModelsPablo Ayuso-Albizu, Juan C. SanMiguel, Pablo Carballeira
Pedestrian Attribute Recognition (PAR) involves identifying various human attributes from images with applications in intelligent monitoring systems. The scarcity of large-scale annotated datasets hinders the generalization of PAR models, specially in complex scenarios involving occlusions, varying poses, and diverse environments. Recent advances in diffusion models have shown promise for generating diverse and realistic synthetic images, allowing to expand the size and variability of training data. However, the potential of diffusion-based data expansion for generating PAR-like images remains underexplored. Such expansion may enhance the robustness and adaptability of PAR models in real-world scenarios. This paper investigates the effectiveness of diffusion models in generating synthetic pedestrian images tailored to PAR tasks. We identify key parameters of img2img diffusion-based data expansion; including text prompts, image properties, and the latest enhancements in diffusion-based data augmentation, and examine their impact on the quality of generated images for PAR. Furthermore, we employ the best-performing expansion approach to generate synthetic images for training PAR models, by enriching the zero-shot datasets. Experimental results show that prompt alignment and image properties are critical factors in image generation, with optimal selection leading to a 4.5% improvement in PAR recognition performance.
CVSep 2, 2025
A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion ModelsAlejandro Alonso, Sawaiz A. Chaudhry, Juan C. SanMiguel et al.
Pedestrian Attribute Recognition (PAR) is a challenging task as models are required to generalize across numerous attributes in real-world data. Traditional approaches focus on complex methods, yet recognition performance is often constrained by training dataset limitations, particularly the under-representation of certain attributes. In this paper, we propose a data-centric approach to improve PAR by synthetic data augmentation guided by textual descriptions. First, we define a protocol to identify weakly recognized attributes across multiple datasets. Second, we propose a prompt-driven pipeline that leverages diffusion models to generate synthetic pedestrian images while preserving the consistency of PAR datasets. Finally, we derive a strategy to seamlessly incorporate synthetic samples into training data, which considers prompt-based annotation rules and modifies the loss function. Results on popular PAR datasets demonstrate that our approach not only boosts recognition of underrepresented attributes but also improves overall model performance beyond the targeted attributes. Notably, this approach strengthens zero-shot generalization without requiring architectural changes of the model, presenting an efficient and scalable solution to improve the recognition of attributes of pedestrians in the real world.