Seeing the Unseen: Errors and Bias in Visual Datasets
It addresses the problem of dataset errors and biases for users of machine vision systems, such as in face recognition and self-driving cars, highlighting incremental insights into common sources of flaws.
The paper investigates errors and biases in visual datasets, revealing that flawed datasets often result from limited categories, incomprehensive sourcing, and poor classification, which can lead to issues like misidentifying black people as gorillas and misrepresenting ethnicities in search results.
From face recognition in smartphones to automatic routing on self-driving cars, machine vision algorithms lie in the core of these features. These systems solve image based tasks by identifying and understanding objects, subsequently making decisions from these information. However, errors in datasets are usually induced or even magnified in algorithms, at times resulting in issues such as recognising black people as gorillas and misrepresenting ethnicities in search results. This paper tracks the errors in datasets and their impacts, revealing that a flawed dataset could be a result of limited categories, incomprehensive sourcing and poor classification.