CVJun 8, 2023

Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training

Colton R. Crum, Aidan Boyd, Kevin Bowyer, Adam Czajka

arXiv:2306.05527v28.49 citationsh-index: 80

Originality Incremental advance

AI Analysis

This addresses the problem of expensive human annotation in computer vision for researchers and practitioners, offering a scalable solution to enhance model accuracy, though it is incremental in leveraging existing saliency techniques.

The paper tackles the high cost of collecting human salience annotations for training models by using teacher models trained on limited human data to generate saliency maps for additional data, which are then used to train student models. Results show this teacher-student paradigm significantly outperforms baselines using all human annotations or no salience data, with improvements across multiple architectures and saliency methods.

Machine learning models have shown increased accuracy in classification tasks when the training process incorporates human perceptual information. However, a challenge in training human-guided models is the cost associated with collecting image annotations for human salience. Collecting annotation data for all images in a large training set can be prohibitively expensive. In this work, we utilize "teacher" models (trained on a small amount of human-annotated data) to annotate additional data by means of teacher models' saliency maps. Then, "student" models are trained using the larger amount of annotated training data. This approach makes it possible to supplement a limited number of human-supplied annotations with an arbitrarily large number of model-generated image annotations. We compare the accuracy achieved by our teacher-student training paradigm with (1) training using all available human salience annotations, and (2) using all available training data without human salience annotations. We use synthetic face detection and fake iris detection as example challenging problems, and report results across four model architectures (DenseNet, ResNet, Xception, and Inception), and two saliency estimation methods (CAM and RISE). Results show that our teacher-student training paradigm results in models that significantly exceed the performance of both baselines, demonstrating that our approach can usefully leverage a small amount of human annotations to generate salience maps for an arbitrary amount of additional training data.

View on arXiv PDF

Similar