Ying Wai Li

LG
h-index31
4papers
6citations
Novelty51%
AI Score28

4 Papers

CHEM-PHMar 18, 2025
Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials

Sakib Matin, Emily Shinkle, Yulia Pimonova et al.

The quality of machine learning interatomic potentials (MLIPs) strongly depends on the quantity of training data as well as the quantum chemistry (QC) level of theory used. Datasets generated with high-fidelity QC methods are typically restricted to small molecules and may be missing energy gradients, which make it difficult to train accurate MLIPs. We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets. First, multiple teacher models are trained to QC energies and then generate atomic forces for all configurations in the dataset. Next, the student MLIP is trained to both QC energies and to ensemble-averaged forces generated by the teacher models. We apply this workflow on the ANI-1ccx dataset where the configuration energies computed at the coupled cluster level of theory. The resulting student MLIPs achieve new state-of-the-art accuracy on the COMP6 benchmark and show improved stability for molecular dynamics simulations.

CLDec 13, 2024
Model-diff: A Tool for Comparative Study of Language Models in the Input Space

Weitang Liu, Yuelei Li, Ying Wai Li et al.

Comparing two (large) language models (LMs) side-by-side and pinpointing their prediction similarities and differences on the same set of inputs are crucial in many real-world scenarios, e.g., one can test if a licensed model was potentially plagiarized by another. Traditional analysis compares the LMs' outputs on some benchmark datasets, which only cover a limited number of inputs of designed perspectives for the intended applications. The benchmark datasets cannot prepare data to cover the test cases from unforeseen perspectives which can help us understand differences between models unbiasedly. In this paper, we propose a new model comparative analysis setting that considers a large input space where brute-force enumeration would be infeasible. The input space can be simply defined as all token sequences that a LM would produce low perplexity on -- we follow this definition in the paper as it would produce the most human-understandable inputs. We propose a novel framework \our that uses text generation by sampling and deweights the histogram of sampling statistics to estimate prediction differences between two LMs in this input space efficiently and unbiasedly. Our method achieves this by drawing and counting the inputs at each prediction difference value in negative log-likelihood. Experiments reveal for the first time the quantitative prediction differences between LMs in a large input space, potentially facilitating the model analysis for applications such as model plagiarism.

LGDec 6, 2023
Evaluation of human-model prediction difference on the Internet Scale of Data

Weitang Liu, Ying Wai Li, Yuelei Li et al.

Evaluating models on datasets often fails to capture their behavior when faced with unexpected and diverse types of inputs. It would be beneficial if we could evaluate the difference between human annotation and model prediction for an internet number of inputs, or more generally, for an input space that enumeration is computationally impractical. Traditional model evaluation methods rely on precision and recall (PR) as metrics, which are typically estimated by comparing human annotations with model predictions on a specific dataset. This is feasible because enumerating thousands of test inputs is manageable. However, estimating PR across a large input space is challenging because enumeration becomes computationally infeasible. We propose OmniInput, a novel approach to evaluate and compare NNs by the PR of an input space. OmniInput is distinctive from previous works as its estimated PR reflects the estimation of the differences between human annotation and model prediction in the input space which is usually too huge to be enumerated. We empirically validate our method within an enumerable input space, and our experiments demonstrate that OmniInput can effectively estimate and compare precision and recall for (large) language models within a broad input space that is not enumerable.

LGSep 7, 2019
A scalable constructive algorithm for the optimization of neural network architectures

Massimiliano Lupo Pasini, Junqi Yin, Ying Wai Li et al.

We propose a new scalable method to optimize the architecture of an artificial neural network. The proposed algorithm, called Greedy Search for Neural Network Architecture, aims to determine a neural network with minimal number of layers that is at least as performant as neural networks of the same structure identified by other hyperparameter search algorithms in terms of accuracy and computational cost. Numerical results performed on benchmark datasets show that, for these datasets, our method outperforms state-of-the-art hyperparameter optimization algorithms in terms of attainable predictive performance by the selected neural network architecture, and time-to-solution for the hyperparameter optimization to complete.