Yiling Jia

IR
h-index117
9papers
3,751citations
Novelty53%
AI Score41

9 Papers

LGOct 8, 2023Code
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Lu Yin, You Wu, Zhenyu Zhang et al.

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters that can be pruned in one-shot without hurting performance. Prevailing LLM pruning strategies have consistently adhered to the practice of uniformly pruning all layers at equivalent sparsity, resulting in robust performance. However, this observation stands in contrast to the prevailing trends observed in the field of vision models, where non-uniform layerwise sparsity typically yields stronger results. To understand the underlying reasons for this disparity, we conduct a comprehensive study and discover a strong correlation with the emergence of activation outliers in LLMs. Inspired by this finding, we introduce a novel LLM pruning methodology that incorporates a tailored set of non-uniform layerwise sparsity ratios, termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL is proportional to the outlier ratio observed within each layer, facilitating a more effective alignment between layerwise weight sparsity and outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family and OPT, spanning various benchmarks, demonstrates the distinct advantages offered by OWL over previous methods. For instance, OWL exhibits a remarkable performance gain, surpassing the state-of-the-art Wanda and SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively, while delivering 2.6x end-to-end inference speed-up in the DeepSparse inference engine. Codes are available at https://github.com/luuyin/OWL.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

LGJan 24, 2022
Learning Neural Contextual Bandits Through Perturbed Rewards

Yiling Jia, Weitong Zhang, Dongruo Zhou et al.

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts. But because their exploration has to be performed in the entire neural network parameter space to obtain nearly optimal regret, the resulting computational cost is prohibitively high. We perturb the rewards when updating the neural network to eliminate the need of explicit exploration and the corresponding computational overhead. We prove that a $\tilde{O}(\tilde{d}\sqrt{T})$ regret upper bound is still achievable under standard regularity conditions, where $T$ is the number of rounds of interactions and $\tilde{d}$ is the effective dimension of a neural tangent kernel matrix. Extensive comparisons with several benchmark contextual bandit algorithms, including two recent neural contextual bandit models, demonstrate the effectiveness and computational efficiency of our proposed neural bandit algorithm.

IRJan 17, 2022
Learning Neural Ranking Models Online from Implicit User Feedback

Yiling Jia, Hongning Wang

Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents. In this work, to unleash the power of representation learning in OL2R, we propose to directly learn a neural ranking model from users' implicit feedback (e.g., clicks) collected on the fly. We focus on RankNet and LambdaRank, due to their great empirical success and wide adoption in offline settings, and control the notorious explore-exploit trade-off based on the convergence analysis of neural networks using neural tangent kernel. Specifically, in each round of result serving, exploration is only performed on document pairs where the predicted rank order between the two documents is uncertain; otherwise, the ranker's predicted order will be followed in result ranking. We prove that under standard assumptions our OL2R solution achieves a gap-dependent upper regret bound of $O(\log^2(T))$, in which the regret is defined on the total number of mis-ordered pairs over $T$ rounds. Comparisons against an extensive set of state-of-the-art OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.

IRNov 1, 2021
Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank

Yiling Jia, Hongning Wang

Online learning to rank (OL2R) has attracted great research interests in recent years, thanks to its advantages in avoiding expensive relevance labeling as required in offline supervised ranking model learning. Such a solution explores the unknowns (e.g., intentionally present selected results on top positions) to improve its relevance estimation. This however triggers concerns on its ranking fairness: different groups of items might receive differential treatments during the course of OL2R. But existing fair ranking solutions usually require the knowledge of result relevance or a performing ranker beforehand, which contradicts with the setting of OL2R and thus cannot be directly applied to guarantee fairness. In this work, we propose a general framework to achieve fairness defined by group exposure in OL2R. The key idea is to calibrate exploration and exploitation for fairness control, relevance learning and online ranking quality. In particular, when the model is exploring a set of results for relevance feedback, we confine the exploration within a subset of random permutations, where fairness across groups is maintained while the feedback is still unbiased. Theoretically we prove such a strategy introduces minimum distortion in OL2R's regret to obtain fairness. Extensive empirical analysis is performed on two public learning to rank benchmark datasets to demonstrate the effectiveness of the proposed solution compared to existing fair OL2R solutions.

LGFeb 28, 2021
PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Yiling Jia, Huazheng Wang, Stephen Guo et al.

Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users. However, the required exploration drives it away from successful practices in offline learning to rank, which limits OL2R's empirical performance and practical applicability. In this work, we propose to estimate a pairwise learning to rank model online. In each round, candidate documents are partitioned and ranked according to the model's confidence on the estimated pairwise rank order, and exploration is only performed on the uncertain pairs of documents, i.e., \emph{divide-and-conquer}. Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution's theoretical convergence with its expected ranking performance. Comparisons against an extensive list of OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.

LGSep 2, 2019
Active Collaborative Sensing for Energy Breakdown

Yiling Jia, Nipun Batra, Hongning Wang et al.

Residential homes constitute roughly one-fourth of the total energy usage worldwide. Providing appliance-level energy breakdown has been shown to induce positive behavioral changes that can reduce energy consumption by 15%. Existing approaches for energy breakdown either require hardware installation in every target home or demand a large set of energy sensor data available for model training. However, very few homes in the world have installed sub-meters (sensors measuring individual appliance energy); and the cost of retrofitting a home with extensive sub-metering eats into the funds available for energy saving retrofits. As a result, strategically deploying sensing hardware to maximize the reconstruction accuracy of sub-metered readings in non-instrumented homes while minimizing deployment costs becomes necessary and promising. In this work, we develop an active learning solution based on low-rank tensor completion for energy breakdown. We propose to actively deploy energy sensors to appliances from selected homes, with a goal to improve the prediction accuracy of the completed tensor with minimum sensor deployment cost. We empirically evaluate our approach on the largest public energy dataset collected in Austin, Texas, USA, from 2013 to 2017. The results show that our approach gives better performance with a fixed number of sensors installed when compared to the state-of-the-art, which is also proven by our theoretical analysis.

IRJun 3, 2019
The FacT: Taming Latent Factor Models for Explainability with Factorization Trees

Yiyi Tao, Yiling Jia, Nan Wang et al.

Latent factor models have achieved great success in personalized recommendations, but they are also notoriously difficult to explain. In this work, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors. Specifically, we build regression trees on users and items respectively with user-generated reviews, and associate a latent profile to each node on the trees to represent users and items. With the growth of regression tree, the latent factors are gradually refined under the regularization imposed by the tree structure. As a result, we are able to track the creation of latent profiles by looking into the path of each factor on regression trees, which thus serves as an explanation for the resulting recommendations. Extensive experiments on two large collections of Amazon and Yelp reviews demonstrate the advantage of our model over several competitive baseline algorithms. Besides, our extensive user study also confirms the practical value of explainable recommendations generated by our model.

IRJun 10, 2018
Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Nan Wang, Hongning Wang, Yiling Jia et al.

Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation} and \textit{opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.