Pattern Recognition Scheme for Large-Scale Cloud Detection over Landmarks
This work provides a solution for accurate cloud detection over landmarks, which is crucial for maintaining the geometric quality of Earth observation satellite data processing, benefiting satellite service providers and researchers.
This paper addresses the problem of cloud contamination over landmarks, which impacts image navigation and registration for geostationary satellites. The authors developed a pattern recognition scheme that detects clouds over landmarks using Meteosat Second Generation data, achieving high accuracy and affordable computational costs across 200 landmark test sites and nearly 7 million images.
Landmark recognition and matching is a critical step in many Image Navigation and Registration (INR) models for geostationary satellite services, as well as to maintain the geometric quality assessment (GQA) in the instrument data processing chain of Earth observation satellites. Matching the landmark accurately is of paramount relevance, and the process can be strongly impacted by the cloud contamination of a given landmark. This paper introduces a complete pattern recognition methodology able to detect the presence of clouds over landmarks using Meteosat Second Generation (MSG) data. The methodology is based on the ensemble combination of dedicated support vector machines (SVMs) dependent on the particular landmark and illumination conditions. This divide-and-conquer strategy is motivated by the data complexity and follows a physically-based strategy that considers variability both in seasonality and illumination conditions along the day to split observations. In addition, it allows training the classification scheme with millions of samples at an affordable computational costs. The image archive was composed of 200 landmark test sites with near 7 million multispectral images that correspond to MSG acquisitions during 2010. Results are analyzed in terms of cloud detection accuracy and computational cost. We provide illustrative source code and a portion of the huge training data to the community.