LGMar 13, 2022

ORDSIM: Ordinal Regression for E-Commerce Query Similarity Prediction

Md. Ahsanul Kabir, Mohammad Al Hasan, Aritra Mandal, Daniel Tunkelang, Zhe Wu

arXiv:2203.06591v15.83 citationsh-index: 30

Originality Incremental advance

AI Analysis

This work addresses query similarity prediction for e-commerce platforms to improve monetization by enhancing high-similarity accuracy, representing an incremental improvement over existing regression methods.

The paper tackled the problem of query similarity prediction in e-commerce by proposing ORDSIM, an ordinal regression model that focuses on accurately predicting high-level similarity to boost monetization, achieving substantially smaller prediction error on a dataset of over 10 million queries from eBay.

Query similarity prediction task is generally solved by regression based models with square loss. Such a model is agnostic of absolute similarity values and it penalizes the regression error at all ranges of similarity values at the same scale. However, to boost e-commerce platform's monetization, it is important to predict high-level similarity more accurately than low-level similarity, as highly similar queries retrieves items according to user-intents, whereas moderately similar item retrieves related items, which may not lead to a purchase. Regression models fail to customize its loss function to concentrate around the high-similarity band, resulting poor performance in query similarity prediction task. We address the above challenge by considering the query prediction as an ordinal regression problem, and thereby propose a model, ORDSIM (ORDinal Regression for SIMilarity Prediction). ORDSIM exploits variable-width buckets to model ordinal loss, which penalizes errors in high-level similarity harshly, and thus enable the regression model to obtain better prediction results for high similarity values. We evaluate ORDSIM on a dataset of over 10 millions e-commerce queries from eBay platform and show that ORDSIM achieves substantially smaller prediction error compared to the competing regression methods on this dataset.

View on arXiv PDF

Similar