Xi Xiong

h-index23

3papers

225citations

Novelty42%

AI Score25

Ranked #166,241 of 194,257 authors (top 86%)#53,350 in CV (top 90%)

3 Papers

32.8CVOct 13, 2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Xi Chen, Xiao Wang, Lucas Beyer et al. · deepmind

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while slightly underperforming on standard image classification benchmarks, SigLIP-based PaLI shows superior performance across various multimodal benchmarks, especially on localization and visually-situated text understanding. We scale the SigLIP image encoder up to 2 billion parameters, and achieves a new state-of-the-art on multilingual cross-modal retrieval. We hope that PaLI-3, at only 5B parameters, rekindles research on fundamental pieces of complex VLMs, and could fuel a new generation of scaled-up models.

11.9IRMar 24, 2021

From Semantic Retrieval to Pairwise Ranking: Applying Deep Learning in E-commerce Search

Rui Li, Yunjiang Jiang, Wenyun Yang et al.

We introduce deep learning models to the two most important stages in product search at JD.com, one of the largest e-commerce platforms in the world. Specifically, we outline the design of a deep learning system that retrieves semantically relevant items to a query within milliseconds, and a pairwise deep re-ranking system, which learns subtle user preferences. Compared to traditional search systems, the proposed approaches are better at semantic retrieval and personalized ranking, achieving significant improvements.

8.6LGMay 1, 2019

Dynamic Origin-Destination Matrix Prediction with Line Graph Neural Networks and Kalman Filter

Xi Xiong, Kaan Ozbay, Li Jin et al.

Modern intelligent transportation systems provide data that allow real-time dynamic demand prediction, which is essential for planning and operations. The main challenge of prediction of dynamic Origin-Destination (O-D) demand matrices is that demands cannot be directly measured by traffic sensors; instead, they have to be inferred from aggregate traffic flow data on traffic links. Specifically, spatial correlation, congestion and time dependent factors need to be considered in general transportation networks. In this paper we propose a novel O-D prediction framework combining heterogeneous prediction in graph neural networks and Kalman filter to recognize spatial and temporal patterns simultaneously. The underlying road network topology is converted into a corresponding line graph in the newly designed Fusion Line Graph Convolutional Networks (FL-GCNs), which provide a general framework of predicting spatial-temporal O-D flows from link information. Data from New Jersey Turnpike network are used to evaluate the proposed model. The results show that our proposed approach yields the best performance under various prediction scenarios. In addition, the advantage of combining deep neural networks and Kalman filter is demonstrated.