Improving Search Suggestions for Alphanumeric Queries
For e-commerce platforms, this provides a practical, interpretable, and production-ready alternative to learned dense retrieval for alphanumeric search suggestions.
The paper tackles the problem of retrieving alphanumeric identifiers (e.g., MPNs, SKUs) in e-commerce search, where traditional methods fail due to sparsity and typographical variation. The proposed training-free, character-level binary encoding with Hamming distance retrieval achieves significant gains in business metrics in A/B tests.
Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework that encodes each alphanumeric sequence as a fixed length binary vector. This representation enables efficient similarity computation via Hamming distance and supports nearest neighbor retrieval over large identifier corpora. An optional re-ranking stage using edit distance refines precision while preserving latency guarantees. The method offers a practical and interpretable alternative to learned dense retrieval models, making it suitable for production deployment in search suggestion generation systems. Significant gains in business metrics in the A/B test further prove utility of our approach.