MaskIt: Masking for efficient utilization of incomplete public datasets for training deep learning models
This addresses the challenge of data scarcity for researchers and practitioners using public datasets, but it is incremental as it builds on existing masking techniques.
The paper tackles the problem of training deep learning models with incomplete public datasets by introducing a masking approach that uses road networks to focus on available data, achieving 78.4% accuracy in predicting trees in the masked region.
A major challenge in training deep learning models is the lack of high quality and complete datasets. In the paper, we present a masking approach for training deep learning models from a publicly available but incomplete dataset. For example, city of Hamburg, Germany maintains a list of trees along the roads, but this dataset does not contain any information about trees in private homes and parks. To train a deep learning model on such a dataset, we mask the street trees and aerial images with the road network. Road network used for creating the mask is downloaded from OpenStreetMap, and it marks the area where the training data is available. The mask is passed to the model as one of the inputs and it also coats the output. Our model learns to successfully predict trees only in the masked region with 78.4% accuracy.