From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction
This provides a dataset for satellite imagery research, but it is incremental as it applies an existing filtering method to a new domain.
The paper tackled the challenge of extracting domain-specific subsets from large image corpora like LAION-5B, resulting in the release of LAION-EO, a dataset containing text-satellite image pairs in high resolution.
Large datasets, such as LAION-5B, contain a diverse distribution of images shared online. However, extraction of domain-specific subsets of large image corpora is challenging. The extraction approach based on an anchor dataset, combined with further filtering, is proposed here and demonstrated for the domain of satellite imagery. This results in the release of LAION-EO, a dataset sourced from the web containing pairs of text and satellite images in high (pixel-wise) resolution. The paper outlines the acquisition procedure as well as some of the features of the dataset.