Riya Ranjan

91.8CLMay 24

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment

Russell Yang, Ruishi Chen, Pierce Kelaita et al.

Two methodologies dominate current practices of benchmarking: rubric-based scoring evaluates items against predefined criteria, whereas comparative judgment elicits pairwise preferences between outputs. Although both methodologies are widely used, the choice between them is rarely justified. We release JudgmentBench, a benchmark of 30 real-world legal tasks, paired with 1,539 rubric scores and 1,530 pairwise preference judgments collected from practicing attorneys--including at major U.S. law firms--with substantial experience. The annotations constitute the first publicly available dataset in a high-expertise domain in which both supervision signals are elicited from the same experts on the same items. Using LLM-generated outputs at three constructed quality levels, we provide an initial empirical comparison: comparative judgments recover the intended quality ordering substantially better than rubrics (mean Spearman's rank correlation of 0.908 vs. 0.150, estimated difference = 0.758 [0.494, 1.021]) while requiring less than half the annotation time. The patterns hold for human annotators and LLM autograders. Beyond this initial comparison, the paired structure of the dataset supports a broader research agenda on how expert judgment should be elicited, aggregated, and used as supervision in domains without verifiable ground truth.

LGOct 23, 2021

A Layer-wise Adversarial-aware Quantization Optimization for Improving Robustness

Chang Song, Riya Ranjan, Hai Li

Neural networks are getting better accuracy with higher energy and computational cost. After quantization, the cost can be greatly saved, and the quantized models are more hardware friendly with acceptable accuracy loss. On the other hand, recent research has found that neural networks are vulnerable to adversarial attacks, and the robustness of a neural network model can only be improved with defense methods, such as adversarial training. In this work, we find that adversarially-trained neural networks are more vulnerable to quantization loss than plain models. To minimize both the adversarial and the quantization losses simultaneously and to make the quantized model robust, we propose a layer-wise adversarial-aware quantization method, using the Lipschitz constant to choose the best quantization parameter settings for a neural network. We theoretically derive the losses and prove the consistency of our metric selection. The experiment results show that our method can effectively and efficiently improve the robustness of quantized adversarially-trained neural networks.

Riya Ranjan

2 Papers