60.2MLMay 28
Improved Distribution Estimation in $\ell_\infty$Doron Cohen, Aryeh Kontorovich, Yonatan Livshitz
We present improved bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These include minimax bounds in expectation and high-probability tail bounds. We resolve some of the open questions posed in Kontorovich and Painsky (JMLR, 2025) -- including a fully empirical version of the tightest risk bound they presented and identifying the form of the worst-case extremal distribution. Encouraging empirical results are reported as well.
CLMar 2, 2023
QAID: Question Answering Inspired Few-shot Intent DetectionAsaf Yehudai, Matan Vetzler, Yosi Mass et al.
Intent detection with semantically similar fine-grained intents is a challenging task. To address it, we reformulate intent detection as a question-answering retrieval task by treating utterances and intent names as questions and answers. To that end, we utilize a question-answering retrieval architecture and adopt a two stages training schema with batch contrastive loss. In the pre-training stage, we improve query representations through self-supervised training. Then, in the fine-tuning stage, we increase contextualized token-level similarity scores between queries and answers from the same intent. Our results on three few-shot intent detection benchmarks achieve state-of-the-art performance.
CLDec 14, 2021
Conversational Search with Mixed-Initiative -- Asking Good Clarification Questions backed-up by Passage RetrievalYosi Mass, Doron Cohen, Asaf Yehudai et al.
We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given the conversation context. Our method leverages passage retrieval from a background content to fine-tune two deep-learning models for ranking candidate clarification questions. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup. We show that our method performs well on both use-cases.
CLOct 5, 2020
Conversational Document Prediction to Assist Customer Care AgentsJatin Ganhotra, Haggai Roitman, Doron Cohen et al.
A frequent pattern in customer care conversations is the agents responding with appropriate webpage URLs that address users' needs. We study the task of predicting the documents that customer care agents can use to facilitate users' needs. We also introduce a new public dataset which supports the aforementioned problem. Using this dataset and two others, we investigate state-of-the art deep learning (DL) and information retrieval (IR) models for the task. Additionally, we analyze the practicality of such systems in terms of inference time complexity. Our show that an hybrid IR+DL approach provides the best of both worlds.
CLFeb 10, 2020
A Study of Human Summaries of Scientific ArticlesOdellia Boni, Guy Feigenblat, Doron Cohen et al.
Researchers and students face an explosion of newly published papers which may be relevant to their work. This led to a trend of sharing human summaries of scientific papers. We analyze the summaries shared in one of these platforms Shortscience.org. The goal is to characterize human summaries of scientific papers, and use some of the insights obtained to improve and adapt existing automatic summarization systems to the domain of scientific papers.
CLAug 29, 2019
A Summarization System for Scientific DocumentsShai Erera, Michal Shmueli-Scheuer, Guy Feigenblat et al.
We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts.
CLNov 1, 2018
Unsupervised Dual-Cascade Learning with Pseudo-Feedback Distillation for Query-based Extractive SummarizationHaggai Roitman, Guy Feigenblat, David Konopnicki et al.
We propose Dual-CES -- a novel unsupervised, query-focused, multi-document extractive summarizer. Dual-CES is designed to better handle the tradeoff between saliency and focus in summarization. To this end, Dual-CES employs a two-step dual-cascade optimization approach with saliency-based pseudo-feedback distillation. Overall, Dual-CES significantly outperforms all other state-of-the-art unsupervised alternatives. Dual-CES is even shown to be able to outperform strong supervised summarizers.
IRJun 7, 2017
An Extended Relevance Model for Session SearchNir Levine, Haggai Roitman, Doron Cohen
The session search task aims at best serving the user's information need given her previous search behavior during the session. We propose an extended relevance model that captures the user's dynamic information need in the session. Our relevance modelling approach is directly driven by the user's query reformulation (change) decisions and the estimate of how much the user's search behavior affects such decisions. Overall, we demonstrate that, the proposed approach significantly boosts session search performance.