GREC: Generalized Referring Expression Comprehension
This work addresses the practical applicability issue in REC for researchers and practitioners by expanding beyond single-target expressions, though it is incremental as it builds directly on existing REC frameworks.
The paper tackles the limitation of classic Referring Expression Comprehension (REC) by introducing a new benchmark, Generalized Referring Expression Comprehension (GREC), which allows expressions to refer to any number of target objects, including multiple or none, and presents the first large-scale dataset, gRefCOCO, to support this extension.
The objective of Classic Referring Expression Comprehension (REC) is to produce a bounding box corresponding to the object mentioned in a given textual description. Commonly, existing datasets and techniques in classic REC are tailored for expressions that pertain to a single target, meaning a sole expression is linked to one specific object. Expressions that refer to multiple targets or involve no specific target have not been taken into account. This constraint hinders the practical applicability of REC. This study introduces a new benchmark termed as Generalized Referring Expression Comprehension (GREC). This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions. The design of GREC and gRefCOCO ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a GREC method implementation code, and GREC evaluation code are available at https://github.com/henghuiding/gRefCOCO.