Yuanzhi Liu

CV
h-index15
3papers
30citations
Novelty43%
AI Score28

3 Papers

CVAug 25, 2024
Evaluating Attribute Comprehension in Large Vision-Language Models

Haiwen Zhang, Zixi Yang, Yuanzhi Liu et al.

Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-tuning process. In this paper, we propose to evaluate the attribute comprehension ability of large vision-language models from two perspectives: attribute recognition and attribute hierarchy understanding. We evaluate three vision-language interactions, including visual question answering, image-text matching, and image-text cosine similarity. Furthermore, we explore the factors affecting attribute comprehension during fine-tuning. Through a series of quantitative and qualitative experiments, we introduce three main findings: (1) Large vision-language models possess good attribute recognition ability, but their hierarchical understanding ability is relatively limited. (2) Compared to ITC, ITM exhibits superior capability in capturing finer details, making it more suitable for attribute understanding tasks. (3) The attribute information in the captions used for fine-tuning plays a crucial role in attribute understanding. We hope this work can help guide future progress in fine-grained visual understanding of large vision-language models.

CVMay 21, 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation

Xinran Wang, Muxi Diao, Yuanzhi Liu et al.

Training text-to-image (T2I) models with detailed captions can significantly improve their generation quality. Existing methods often rely on simplistic metrics like caption length to represent the detailness of the caption in the T2I training set. In this paper, we propose a new metric to estimate caption detailness based on two aspects: image coverage rate (ICR), which evaluates whether the caption covers all regions/objects in the image, and average object detailness (AOD), which quantifies the detailness of each object's description. Through experiments on the COCO dataset using ShareGPT4V captions, we demonstrate that T2I models trained on high-ICR and -AOD captions achieve superior performance on DPG and other benchmarks. Notably, our metric enables more effective data selection-training on only 20% of full data surpasses both full-dataset training and length-based selection method, improving alignment and reconstruction ability. These findings highlight the critical role of detail-aware metrics over length-based heuristics in caption selection for T2I tasks.

ROFeb 8, 2021
Simultaneous Localization and Mapping Related Datasets: A Comprehensive Survey

Yuanzhi Liu, Yujia Fu, Fengdong Chen et al.

Due to the complicated procedure and costly hardware, Simultaneous Localization and Mapping (SLAM) has been heavily dependent on public datasets for drill and evaluation, leading to many impressive demos and good benchmark scores. However, with a huge contrast, SLAM is still struggling on the way towards mature deployment, which sounds a warning: some of the datasets are overexposed, causing biased usage and evaluation. This raises the problem on how to comprehensively access the existing datasets and correctly select them. Moreover, limitations do exist in current datasets, then how to build new ones and which directions to go? Nevertheless, a comprehensive survey which can tackle the above issues does not exist yet, while urgently demanded by the community. To fill the gap, this paper strives to cover a range of cohesive topics about SLAM related datasets, including general collection methodology and fundamental characteristic dimensions, SLAM related tasks taxonomy and datasets categorization, introduction of state-of-the-arts, overview and comparison of existing datasets, review of evaluation criteria, and analyses and discussions about current limitations and future directions, looking forward to not only guiding the dataset selection, but also promoting the dataset research.