ML LGOct 28, 2022

Conservative Likelihood Ratio Estimator for Infrequent Data Slightly above a Frequency Threshold

Masato Kikuchi, Yuhi Kusakabe, Tadachika Ozono

arXiv:2211.00545v12.1h-index: 9

Originality Incremental advance

AI Analysis

This incremental improvement addresses overestimation issues in likelihood ratio estimation for infrequent data, benefiting tasks like named entity context prediction in natural language processing.

The study tackled the problem of naive likelihood ratio estimation overestimating for infrequent data, proposing a conservative estimator for frequencies slightly above a threshold; experimental results showed improved prediction accuracy in named entity context prediction while maintaining efficiency.

A naive likelihood ratio (LR) estimation using the observed frequencies of events can overestimate LRs for infrequent data. One approach to avoid this problem is to use a frequency threshold and set the estimates to zero for frequencies below the threshold. This approach eliminates the computation of some estimates, thereby making practical tasks using LRs more efficient. However, it still overestimates LRs for low frequencies near the threshold. This study proposes a conservative estimator for low frequencies, slightly above the threshold. Our experiment used LRs to predict the occurrence contexts of named entities from a corpus. The experimental results demonstrate that our estimator improves the prediction accuracy while maintaining efficiency in the context prediction task.

View on arXiv PDF

Similar