A model for predicting price polarity of real estate properties using information of real estate market websites
This is an incremental improvement for real estate market analysis, offering a model to assess price polarity in Bogotá.
The paper tackles predicting whether a real estate property's price is higher or lower than average using data from market websites, achieving slightly higher accuracy with text descriptions compared to features alone.
This paper presents a model that uses the information that sellers publish in real estate market websites to predict whether a property has higher or lower price than the average price of its similar properties. The model learns the correlation between price and information (text descriptions and features) of real estate properties through automatic identification of latent semantic content given by a machine learning model based on doc2vec and xgboost. The proposed model was evaluated with a data set of 57,516 publications of real estate properties collected from 2016 to 2018 of Bogotá city. Results show that the accuracy of a classifier that involves text descriptions is slightly higher than a classifier that only uses features of the real estate properties, as text descriptions tends to contain detailed information about the property.