Vedant Dharnidharka

5papers

1,034citations

Novelty31%

AI Score35

Ranked #124,619 of 205,806 authors (top 61%)#22,063 in CL (top 68%)

5 Papers

CLMay 21, 2022

Named Entity Linking with Entity Representation by Multiple Embeddings

Oleg Vasilyev, Alex Dauenhauer, Vedant Dharnidharka et al.

We propose a simple and practical method for named entity linking (NEL), based on entity representation by multiple embeddings. To explore this method, and to review its dependency on parameters, we measure its performance on Namesakes, a highly challenging dataset of ambiguously named entities. Our observations suggest that the minimal number of mentions required to create a knowledge base (KB) entity is very important for NEL performance. The number of embeddings is less important and can be kept small, within as few as 10 or less. We show that our representations of KB entities can be adjusted using only KB data, and the adjustment can improve NEL performance. We also compare NEL performance of embeddings obtained from tuning language model on diverse news texts as opposed to tuning on more uniform texts from public datasets XSum, CNN / Daily Mail. We found that tuning on diverse news provides better embeddings.

LGNov 25, 2025

Cisco Time Series Model Technical Report

Liang Gou, Archit Khare, Praneet Pabolu et al.

We introduce the Cisco Time Series Model, a univariate zero-shot forecaster. This time series foundation model is the result of a general architectural innovation to a time series model enabling it to accept multiresolution input, applied to a popular decoder-only time series model (TimesFM). The resulting multiresolution decoder-only model is trained on over 300B unique data points, with more than half coming from the observability domain. Quantitative and qualitative evaluations demonstrate that the resulting model achieves superior performance on observability datasets while retaining very similar performance on a standard general-purpose forecasting benchmark (GIFT-Eval), and suggest that the multiresolution structure enables the model to make more accurate predictions on long context input.

CLNov 22, 2021

Namesakes: Ambiguously Named Entities from Wikipedia and News

Oleg Vasilyev, Aysu Altun, Nidhi Vyas et al.

We present Namesakes, a dataset of ambiguously named entities obtained from English-language Wikipedia and news articles. It consists of 58862 mentions of 4148 unique entities and their namesakes: 1000 mentions from news, 28843 from Wikipedia articles about the entity, and 29019 Wikipedia backlink mentions. Namesakes should be helpful in establishing challenging benchmarks for the task of named entity linking (NEL).

CLOct 13, 2020

Sensitivity of BLANC to human-scored qualities of text summaries

Oleg Vasilyev, Vedant Dharnidharka, Nicholas Egan et al.

We explore the sensitivity of a document summary quality estimator, BLANC, to human assessment of qualities for the same summaries. In our human evaluations, we distinguish five summary qualities, defined by how fluent, understandable, informative, compact, and factually correct the summary is. We make the case for optimal BLANC parameters, at which the BLANC sensitivity to almost all of summary qualities is about as good as the sensitivity of a human annotator.

CLFeb 23, 2020

Fill in the BLANC: Human-free quality estimation of document summaries

Oleg Vasilyev, Vedant Dharnidharka, John Bohannon

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the document's text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.