Countering Inconsistent Labelling by Google's Vision API for Rotated Images
This addresses a robustness issue in a widely used commercial API for image analysis, though it is incremental as it adds a pre-processing step rather than modifying the core API.
The paper tackles the problem of Google's Vision API producing inconsistent labels for rotated images by proposing a modular pre-processing pipeline that corrects image orientation using a ResNet50 model, resulting in significantly higher performance with a Percentage Error metric compared to rotated counterparts.
Google's Vision API analyses images and provides a variety of output predictions, one such type is context-based labelling. In this paper, it is shown that adversarial examples that cause incorrect label prediction and spoofing can be generated by rotating the images. Due to the black-boxed nature of the API, a modular context-based pre-processing pipeline is proposed consisting of a Res-Net50 model, that predicts the angle by which the image must be rotated to correct its orientation. The pipeline successfully performs the correction whilst maintaining the image's resolution and feeds it to the API which generates labels similar to the original correctly oriented image and using a Percentage Error metric, the performance of the corrected images as compared to its rotated counter-parts is found to be significantly higher. These observations imply that the API can benefit from such a pre-processing pipeline to increase robustness to rotational perturbances.