A Comparison of Statistical and Machine Learning Algorithms for Predicting Rents in the San Francisco Bay Area
This work provides a comparison of machine learning and statistical methods for predicting rents, which is useful for urban planners and land use modelers, demonstrating the superior predictive accuracy of machine learning.
This paper compares random forest regression and ordinary least squares for predicting rents per square foot in the San Francisco Bay Area. The random forest model achieved substantially higher predictive accuracy than the multiple regression model.
Urban transportation and land use models have used theory and statistical modeling methods to develop model systems that are useful in planning applications. Machine learning methods have been considered too 'black box', lacking interpretability, and their use has been limited within the land use and transportation modeling literature. We present a use case in which predictive accuracy is of primary importance, and compare the use of random forest regression to multiple regression using ordinary least squares, to predict rents per square foot in the San Francisco Bay Area using a large volume of rental listings scraped from the Craigslist website. We find that we are able to obtain useful predictions from both models using almost exclusively local accessibility variables, though the predictive accuracy of the random forest model is substantially higher.