Real Estate Attribute Prediction from Multiple Visual Modalities with Missing Data
This work addresses the challenge of incomplete data in real estate valuation for property assessors, but it is incremental as it builds on existing multimodal fusion methods.
The paper tackles the problem of predicting real estate attributes from sparse visual data by using indoor and outdoor photos, and finds that fusing both modalities improves prediction accuracy by up to 5% in Macro F1-score.
The assessment and valuation of real estate requires large datasets with real estate information. Unfortunately, real estate databases are usually sparse in practice, i.e., not for each property every important attribute is available. In this paper, we study the potential of predicting high-level real estate attributes from visual data, specifically from two visual modalities, namely indoor (interior) and outdoor (facade) photos. We design three models using different multimodal fusion strategies and evaluate them for three different use cases. Thereby, a particular challenge is to handle missing modalities. We evaluate different fusion strategies, present baselines for the different prediction tasks, and find that enriching the training data with additional incomplete samples can lead to an improvement in prediction accuracy. Furthermore, the fusion of information from indoor and outdoor photos results in a performance boost of up to 5% in Macro F1-score.