Nadav Bar

CL
h-index61
4papers
82citations
Novelty55%
AI Score35

4 Papers

CLMay 2, 2024
Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

Lotem Golany, Filippo Galgani, Maya Mamo et al.

Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexity. Since automating attribution remains difficult, we propose a semi-automatic approach: dialog queries and responses are generated with LLMs, followed by human verification and identification of attribution spans. Using this approach, we created MISeD -- Meeting Information Seeking Dialogs dataset -- a dataset of information-seeking dialogs focused on meeting transcripts. Models finetuned with MISeD demonstrate superior performance compared to off-the-shelf models, even those of larger size. Finetuning on MISeD gives comparable response generation quality to finetuning on fully manual data, while improving attribution quality and reducing time and effort.

SDDec 11, 2024
Zero-Shot Mono-to-Binaural Speech Synthesis

Alon Levkovitch, Julian Salazar, Soroosh Mariooryad et al. · meta-ai

We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.

ASJun 4, 2024
SimulTron: On-Device Simultaneous Speech to Speech Translation

Alex Agranovich, Eliya Nachmani, Oleg Rybakov et al.

Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the strengths of the Translatotron framework while incorporating key modifications for streaming operation, and an adjustable fixed delay. Our experiments show that SimulTron surpasses Translatotron 2 in offline evaluations. Furthermore, real-time evaluations reveal that SimulTron improves upon the performance achieved by Translatotron 1. Additionally, SimulTron achieves superior BLEU scores and latency compared to previous real-time S2ST method on the MuST-C dataset. Significantly, we have successfully deployed SimulTron on a Pixel 7 Pro device, show its potential for simultaneous S2ST on-device.

CVDec 14, 2016
Border-Peeling Clustering

Hadar Averbuch-Elor, Nadav Bar, Daniel Cohen-Or

In this paper, we present a novel non-parametric clustering technique. Our technique is based on the notion that each latent cluster is comprised of layers that surround its core, where the external layers, or border points, implicitly separate the clusters. Unlike previous techniques, such as DBSCAN, where the cores of the clusters are defined directly by their densities, here the latent cores are revealed by a progressive peeling of the border points. Analyzing the density of the local neighborhoods allows identifying the border points and associating them with points of inner layers. We show that the peeling process adapts to the local densities and characteristics to successfully separate adjacent clusters (of possibly different densities). We extensively tested our technique on large sets of labeled data, including high-dimensional datasets of deep features that were trained by a convolutional neural network. We show that our technique is competitive to other state-of-the-art non-parametric methods using a fixed set of parameters throughout the experiments.