Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs
This work addresses generalization challenges in e-commerce product search, but it is incremental as it builds on prior findings about domain-specific fine-tuning.
The paper tackled the problem of improving generalization in semantic product search by evaluating pre-trained models and found that domain-specific fine-tuning with clickstream data enhances generalization, while general-domain fine-tuning does not.
Recently, semantic search has been successfully applied to e-commerce product search and the learned semantic space(s) for query and product encoding are expected to generalize to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not help generalization, which aligns with the discovery of prior art. Proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a publicly available manual annotated query-product pair da