LGJul 14, 2025

Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model

arXiv:2507.10385v1h-index: 8
Originality Incremental advance
AI Analysis

This work addresses query reformulation for e-commerce search engines, offering an incremental improvement by incorporating semantic tags into existing transformer methods.

The paper tackled the problem of ambiguous and misaligned e-commerce queries by proposing TagBERT, a transformer model that uses semantic tags for token classification, achieving superior performance over BERT and other models on real-life datasets.

The major task of any e-commerce search engine is to retrieve the most relevant inventory items, which best match the user intent reflected in a query. This task is non-trivial due to many reasons, including ambiguous queries, misaligned vocabulary between buyers, and sellers, over- or under-constrained queries by the presence of too many or too few tokens. To address these challenges, query reformulation is used, which modifies a user query through token dropping, replacement or expansion, with the objective to bridge semantic gap between query tokens and users' search intent. Early methods of query reformulation mostly used statistical measures derived from token co-occurrence frequencies from selective user sessions having clicks or purchases. In recent years, supervised deep learning approaches, specifically transformer-based neural language models, or sequence-to-sequence models are being used for query reformulation task. However, these models do not utilize the semantic tags of a query token, which are significant for capturing user intent of an e-commerce query. In this work, we pose query reformulation as a token classification task, and solve this task by designing a dependency-aware transformer-based language model, TagBERT, which makes use of semantic tags of a token for learning superior query phrase embedding. Experiments on large, real-life e-commerce datasets show that TagBERT exhibits superior performance than plethora of competing models, including BERT, eBERT, and Sequence-to-Sequence transformer model for important token classification task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes