Large Scale Purchase Prediction with Historical User Actions on B2C Online Retail Platform
This work addresses purchase prediction for online retail platforms, but it is incremental as it applies standard machine learning methods with ensemble techniques to a large-scale dataset.
The paper tackled predicting future user purchases on Tmall's B2C platform using over half a billion action records, achieving an F1Score of 6.11 and ranking 7th out of 7,276 teams in the Tmall Recommendation Prize 2014 competition.
This paper describes the solution of Bazinga Team for Tmall Recommendation Prize 2014. With real-world user action data provided by Tmall, one of the largest B2C online retail platforms in China, this competition requires to predict future user purchases on Tmall website. Predictions are judged on F1Score, which considers both precision and recall for fair evaluation. The data set provided by Tmall contains more than half billion action records from over ten million distinct users. Such massive data volume poses a big challenge, and drives competitors to write every single program in MapReduce fashion and run it on distributed cluster. We model the purchase prediction problem as standard machine learning problem, and mainly employ regression and classification methods as single models. Individual models are then aggregated in a two-stage approach, using linear regression for blending, and finally a linear ensemble of blended models. The competition is approaching the end but still in running during writing this paper. In the end, our team achieves F1Score 6.11 and ranks 7th (out of 7,276 teams in total).