64.5IRMay 22
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search RetrievalVipul Gupta, Shikhar Mohan, Lakshya Kumar et al.
In the competitive landscape of sponsored search, balancing retrieval quality with production latency is a critical challenge. While large retrieval models based on Small Language Models (SLMs) such as Qwen3-Embedding-4B/8B set strong upper bounds on public benchmarks, their deployment in high-throughput, latency-sensitive environments remains impractical. In this paper, we present HARNESS-LM (HLM), a three-phase training framework for transferring the capabilities of large-scale retrievers into compact, cost-efficient models. The approach comprises: (1) training a high-performance reference ("teacher") retriever by fine-tuning a billion-parameter-scale SLM; (2) aligning query representations via an L2 objective to distill knowledge into a sub-600M parameter student encoder; and (3) applying a final contrastive refinement stage to optimize the student for retrieval performance. We also present a comprehensive empirical study of key design choices, including alignment objectives, embedding dimensionality, model scale, architecture, and optimization strategies, to identify configurations that are most effective in production settings. On a real-world Bing Ads evaluation benchmark, HLM recovers over 98% of the reference retriever's precision across multiple settings, while delivering up to 27x lower online query-encoder latency and 20x higher throughput on NVIDIA A100 GPUs. Online A/B testing on Bing Ads further shows a +1% Revenue, +0.6% Impression, and +0.4% Click uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model, clearly highlighting the practical efficacy of the HLM recipe in a real-world sponsored search setting.
IROct 20, 2022
Fine-Grained Session Recommendations in E-commerce using Deep Reinforcement LearningDiddigi Raghu Ram Bharadwaj, Lakshya Kumar, Saif Jawaid et al.
Sustaining users' interest and keeping them engaged in the platform is very important for the success of an e-commerce business. A session encompasses different activities of a user between logging into the platform and logging out or making a purchase. User activities in a session can be classified into two groups: Known Intent and Unknown intent. Known intent activity pertains to the session where the intent of a user to browse/purchase a specific product can be easily captured. Whereas in unknown intent activity, the intent of the user is not known. For example, consider the scenario where a user enters the session to casually browse the products over the platform, similar to the window shopping experience in the offline setting. While recommending similar products is essential in the former, accurately understanding the intent and recommending interesting products is essential in the latter setting in order to retain a user. In this work, we focus primarily on the unknown intent setting where our objective is to recommend a sequence of products to a user in a session to sustain their interest, keep them engaged and possibly drive them towards purchase. We formulate this problem in the framework of the Markov Decision Process (MDP), a popular mathematical framework for sequential decision making and solve it using Deep Reinforcement Learning (DRL) techniques. However, training the next product recommendation is difficult in the RL paradigm due to large variance in browse/purchase behavior of the users. Therefore, we break the problem down into predicting various product attributes, where a pattern/trend can be identified and exploited to build accurate models. We show that the DRL agent provides better performance compared to a greedy strategy.
LGFeb 28, 2024
Graph Regularized Encoder Training for Extreme ClassificationAnshul Mittal, Shikhar Mohan, Deepak Saini et al.
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
IRJul 17, 2021
Neural Search: Learning Query and Product Representations in Fashion E-commerceLakshya Kumar, Sagnik Sarkar
Typical e-commerce platforms contain millions of products in the catalog. Users visit these platforms and enter search queries to retrieve their desired products. Therefore, showing the relevant products at the top is essential for the success of e-commerce platforms. We approach this problem by learning low dimension representations for queries and product descriptions by leveraging user click-stream data as our main source of signal for product relevance. Starting from GRU-based architectures as our baseline model, we move towards a more advanced transformer-based architecture. This helps the model to learn contextual representations of queries and products to serve better search results and understand the user intent in an efficient manner. We perform experiments related to pre-training of the Transformer based RoBERTa model using a fashion corpus and fine-tuning it over the triplet loss. Our experiments on the product ranking task show that the RoBERTa model is able to give an improvement of 7.8% in Mean Reciprocal Rank(MRR), 15.8% in Mean Average Precision(MAP) and 8.8% in Normalized Discounted Cumulative Gain(NDCG), thus outperforming our GRU based baselines. For the product retrieval task, RoBERTa model is able to outperform other two models with an improvement of 164.7% in Precision@50 and 145.3% in Recall@50. In order to highlight the importance of pre-training RoBERTa for fashion domain, we qualitatively compare already pre-trained RoBERTa on standard datasets with our custom pre-trained RoBERTa over a fashion corpus for the query token prediction task. Finally, we also show a qualitative comparison between GRU and RoBERTa results for product retrieval task for some test queries.
LGJul 6, 2020
Deep Contextual Embeddings for Address Classification in E-commerceShreyas Mangalgi, Lakshya Kumar, Ravindra Babu Tallamraju
E-commerce customers in developing nations like India tend to follow no fixed format while entering shipping addresses. Parsing such addresses is challenging because of a lack of inherent structure or hierarchy. It is imperative to understand the language of addresses, so that shipments can be routed without delays. In this paper, we propose a novel approach towards understanding customer addresses by deriving motivation from recent advances in Natural Language Processing (NLP). We also formulate different pre-processing steps for addresses using a combination of edit distance and phonetic algorithms. Then we approach the task of creating vector representations for addresses using Word2Vec with TF-IDF, Bi-LSTM and BERT based approaches. We compare these approaches with respect to sub-region classification task for North and South Indian cities. Through experiments, we demonstrate the effectiveness of generalized RoBERTa model, pre-trained over a large address corpus for language modelling task. Our proposed RoBERTa model achieves a classification accuracy of around 90% with minimal text preprocessing for sub-region classification task outperforming all other approaches. Once pre-trained, the RoBERTa model can be fine-tuned for various downstream tasks in supply chain like pincode suggestion and geo-coding. The model generalizes well for such tasks even with limited labelled data. To the best of our knowledge, this is the first of its kind research proposing a novel approach of understanding customer addresses in e-commerce domain by pre-training language models and fine-tuning them for different purposes.
CLSep 6, 2017
"Having 2 hours to write a paper is fun!": Detecting Sarcasm in Numerical Portions of TextLakshya Kumar, Arpan Somani, Pushpak Bhattacharyya
Sarcasm occurring due to the presence of numerical portions in text has been quoted as an error made by automatic sarcasm detection approaches in the past. We present a first study in detecting sarcasm in numbers, as in the case of the sentence 'Love waking up at 4 am'. We analyze the challenges of the problem, and present Rule-based, Machine Learning and Deep Learning approaches to detect sarcasm in numerical portions of text. Our Deep Learning approach outperforms four past works for sarcasm detection and Rule-based and Machine learning approaches on a dataset of tweets, obtaining an F1-score of 0.93. This shows that special attention to text containing numbers may be useful to improve state-of-the-art in sarcasm detection.