IRNov 4, 2022
A Transformer-Based Substitute Recommendation Model Incorporating Weakly Supervised Customer Behavior DataWenting Ye, Hongfei Yang, Shuai Zhao et al.
The substitute-based recommendation is widely used in E-commerce to provide better alternatives to customers. However, existing research typically uses the customer behavior signals like co-view and view-but-purchase-another to capture the substitute relationship. Despite its intuitive soundness, we find that such an approach might ignore the functionality and characteristics of products. In this paper, we adapt substitute recommendation into language matching problem by taking product title description as model input to consider product functionality. We design a new transformation method to de-noise the signals derived from production data. In addition, we consider multilingual support from the engineering point of view. Our proposed end-to-end transformer-based model achieves both successes from offline and online experiments. The proposed model has been deployed in a large-scale E-commerce website for 11 marketplaces in 6 languages. Our proposed model is demonstrated to increase revenue by 19% based on an online A/B experiment.
CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesGheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
QMMay 31, 2023
Causal Intervention for Measuring Confidence in Drug-Target Interaction PredictionWenting Ye, Chen Li, Yang Xie et al.
Identifying and discovering drug-target interactions(DTIs) are vital steps in drug discovery and development. They play a crucial role in assisting scientists in finding new drugs and accelerating the drug development process. Recently, knowledge graph and knowledge graph embedding (KGE) models have made rapid advancements and demonstrated impressive performance in drug discovery. However, such models lack authenticity and accuracy in drug target identification, leading to an increased misjudgment rate and reduced drug development efficiency. To address these issues, we focus on the problem of drug-target interactions, with knowledge mapping as the core technology. Specifically, a causal intervention-based confidence measure is employed to assess the triplet score to improve the accuracy of the drug-target interaction prediction model. Experimental results demonstrate that the developed confidence measurement method based on causal intervention can significantly enhance the accuracy of DTI link prediction, particularly for high-precision models. The predicted results are more valuable in guiding the design and development of subsequent drug development experiments, thereby significantly improving the efficiency of drug development.
LGNov 11, 2017
A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding CorrectionWenting Ye, Xiang Liu, Tianwei Yue et al.
While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.