PACO: Parts and Attributes of Common Objects
This dataset enables more detailed object instance descriptions for computer vision research, though it is incremental as it builds on existing datasets like LVIS and Ego4D.
The authors introduced PACO, a large-scale dataset spanning 75 object categories, 456 object-part categories, and 55 attributes, with 641K part masks across 260K object boxes, to address the need for richer annotations beyond traditional object masks, and they provided benchmark results for tasks like part mask segmentation and zero-shot instance detection.
Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets. We provide 641K part masks annotated across 260K object boxes, with roughly half of them exhaustively annotated with attributes as well. We design evaluation metrics and provide benchmark results for three tasks on the dataset: part mask segmentation, object and part attribute prediction and zero-shot instance detection. Dataset, models, and code are open-sourced at https://github.com/facebookresearch/paco.