Michael Shavlovsky

OC
h-index15
5papers
57citations
Novelty55%
AI Score44

5 Papers

LGJul 31, 2023
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky et al.

On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. However, these works often study idealized problem settings which may fail to capture complexities of real-world scenarios. We identify a natural tabular data setting where DL models can outperform GBDTs: tabular Learning-to-Rank (LTR) under label scarcity. Tabular LTR applications, including search and recommendation, often have an abundance of unlabeled data, and scarce labeled data. We show that DL rankers can utilize unsupervised pretraining to exploit this unlabeled data. In extensive experiments over both public and proprietary datasets, we show that pretrained DL rankers consistently outperform GBDT rankers on ranking metrics -- sometimes by as much as 38% -- both overall and on outliers.

81.3CLApr 23
GiVA: Gradient-Informed Bases for Vector-Based Adaptation

Neeraj Gangwar, Rishabh Deshmukh, Michael Shavlovsky et al.

As model sizes continue to grow, parameter-efficient fine-tuning has emerged as a powerful alternative to full fine-tuning. While LoRA is widely adopted among these methods, recent research has explored vector-based adaptation methods due to their extreme parameter efficiency. However, these methods typically require substantially higher ranks than LoRA to match its performance, leading to increased training costs. This work introduces GiVA, a gradient-based initialization strategy for vector-based adaptation. It achieves training times comparable to LoRA and maintains the extreme parameter efficiency of vector-based adaptation. We evaluate GiVA across diverse benchmarks, including natural language understanding, natural language generation, and image classification. Experiments show that our approach consistently outperforms or achieves performance competitive with existing vector-based adaptation methods and LoRA while reducing rank requirements by a factor of eight ($8\times$).

OCMar 8, 2024
A Sinkhorn-type Algorithm for Constrained Optimal Transport

Xun Tang, Holakou Rahmanian, Michael Shavlovsky et al.

Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation and introduce a Sinkhorn-type algorithm for such constrained OT problems supported by theoretical guarantees. We first bound the approximation error when solving the problem through entropic regularization, which reduces exponentially with the increase of the regularization parameter. Furthermore, we prove a sublinear first-order convergence rate of the proposed Sinkhorn-type algorithm in the dual space by characterizing the optimization procedure with a Lyapunov function. To achieve fast and higher-order convergence under weak entropy regularization, we augment the Sinkhorn-type algorithm with dynamic regularization scheduling and second-order acceleration. Overall, this work systematically combines recent theoretical and numerical advances in entropic optimal transport with the constrained case, allowing practitioners to derive approximate transport plans in complex scenarios.

OCJan 20, 2024
Accelerating Sinkhorn Algorithm with Sparse Newton Iterations

Xun Tang, Michael Shavlovsky, Holakou Rahmanian et al.

Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n^2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.

SIAug 24, 2013
CrowdGrader: Crowdsourcing the Evaluation of Homework Assignments

Luca de Alfaro, Michael Shavlovsky

Crowdsourcing offers a practical method for ranking and scoring large amounts of items. To investigate the algorithms and incentives that can be used in crowdsourcing quality evaluations, we built CrowdGrader, a tool that lets students submit and collaboratively grade solutions to homework assignments. We present the algorithms and techniques used in CrowdGrader, and we describe our results and experience in using the tool for several computer-science assignments. CrowdGrader combines the student-provided grades into a consensus grade for each submission using a novel crowdsourcing algorithm that relies on a reputation system. The algorithm iterativerly refines inter-dependent estimates of the consensus grades, and of the grading accuracy of each student. On synthetic data, the algorithm performs better than alternatives not based on reputation. On our preliminary experimental data, the performance seems dependent on the nature of review errors, with errors that can be ascribed to the reviewer being more tractable than those arising from random external events. To provide an incentive for reviewers, the grade each student receives in an assignment is a combination of the consensus grade received by their submissions, and of a reviewing grade capturing their reviewing effort and accuracy. This incentive worked well in practice.