CVMMFeb 5, 2025

Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly Search

arXiv:2502.03230v12 citationsh-index: 10WWW
Originality Synthesis-oriented
AI Analysis

This addresses the domain-specific problem of text-based person anomaly search for pedestrian analysis, with incremental improvements.

The paper tackled the WWW 2025 Text-based Person Anomaly Search challenge by introducing a Similarity Coverage Analysis strategy to handle similar text descriptions, achieving excellent performance in identifying normal or abnormal pedestrian behavior from images.

This paper presents the HFUT-LMC team's solution to the WWW 2025 challenge on Text-based Person Anomaly Search (TPAS). The primary objective of this challenge is to accurately identify pedestrians exhibiting either normal or abnormal behavior within a large library of pedestrian images. Unlike traditional video analysis tasks, TPAS significantly emphasizes understanding and interpreting the subtle relationships between text descriptions and visual data. The complexity of this task lies in the model's need to not only match individuals to text descriptions in massive image datasets but also accurately differentiate between search results when faced with similar descriptions. To overcome these challenges, we introduce the Similarity Coverage Analysis (SCA) strategy to address the recognition difficulty caused by similar text descriptions. This strategy effectively enhances the model's capacity to manage subtle differences, thus improving both the accuracy and reliability of the search. Our proposed solution demonstrated excellent performance in this challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes