Compressed Feature Quality Assessment: Dataset and Baselines
This work addresses the need for better quality assessment in feature compression for machine learning applications, though it is incremental as it primarily provides a dataset and baseline analysis without proposing a new method.
The paper tackles the problem of evaluating the semantic fidelity of compressed features in resource-constrained environments, establishing the first benchmark dataset with 300 original and 12000 compressed features across three vision tasks and four codecs, and finds that existing metrics like MSE, cosine similarity, and CKA are insufficient for capturing semantic degradation.
The widespread deployment of large models in resource-constrained environments has underscored the need for efficient transmission of intermediate feature representations. In this context, feature coding, which compresses features into compact bitstreams, becomes a critical component for scenarios involving feature transmission, storage, and reuse. However, this compression process inevitably introduces semantic degradation that is difficult to quantify with traditional metrics. To address this, we formalize the research problem of Compressed Feature Quality Assessment (CFQA), aiming to evaluate the semantic fidelity of compressed features. To advance CFQA research, we propose the first benchmark dataset, comprising 300 original features and 12000 compressed features derived from three vision tasks and four feature codecs. Task-specific performance degradation is provided as true semantic distortion for evaluating CFQA metrics. We systematically assess three widely used metrics -- MSE, cosine similarity, and Centered Kernel Alignment (CKA) -- in terms of their ability to capture semantic degradation. Our findings demonstrate the representativeness of the proposed dataset while underscoring the need for more sophisticated metrics capable of measuring semantic distortion in compressed features. This work advances the field by establishing a foundational benchmark and providing a critical resource for the community to explore CFQA. To foster further research, we release the dataset and all associated source code at https://github.com/chansongoal/Compressed-Feature-Quality-Assessment.