Vitaly Shalumov

CL
h-index6
5papers
42citations
Novelty22%
AI Score19

5 Papers

CLApr 18, 2023
HeRo: RoBERTa and Longformer Hebrew Language Models

Vitaly Shalumov, Harel Haskey

In this paper, we fill in an existing gap in resources available to the Hebrew NLP community by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained language model HeRo for standard length inputs and an efficient transformer LongHeRo for long input sequences. The HeRo model was evaluated on the sentiment analysis, the named entity recognition, and the question answering tasks while the LongHeRo model was evaluated on the document classification task with a dataset composed of long documents. Both HeRo and LongHeRo presented state-of-the-art performance. The dataset and model checkpoints used in this work are publicly available.

CLJan 2, 2023
Transformer Based Geocoding

Yuval Solaz, Vitaly Shalumov

In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.

CLMar 12, 2024
Mevaker: Conclusion Extraction and Allocation Resources for the Hebrew Language

Vitaly Shalumov, Harel Haskey, Yuval Solaz

In this paper, we introduce summarization MevakerSumm and conclusion extraction MevakerConc datasets for the Hebrew language based on the State Comptroller and Ombudsman of Israel reports, along with two auxiliary datasets. We accompany these datasets with models for conclusion extraction (HeConE, HeConEspc) and conclusion allocation (HeCross). All of the code, datasets, and model checkpoints used in this work are publicly available.

CLMay 31, 2023
Measuring the Robustness of NLP Models to Domain Shifts

Nitay Calderon, Naveh Porat, Eyal Ben-David et al.

Existing research on Domain Robustness (DR) suffers from disparate setups, limited task variety, and scarce research on recent capabilities such as in-context learning. Furthermore, the common practice of measuring DR might not be fully accurate. Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation. However, we argue that the Target Drop (TD), which measures degradation from the target in-domain performance, should be used as a complementary point of view. To address these issues, we first curated a DR benchmark comprised of 7 diverse NLP tasks, which enabled us to measure both the SD and the TD. We then conducted a comprehensive large-scale DR study involving over 14,000 domain shifts across 21 fine-tuned models and few-shot LLMs. We found that both model types suffer from drops upon domain shifts. While fine-tuned models excel in-domain, few-shot LLMs often surpass them cross-domain, showing better robustness. In addition, we found that a large SD can often be explained by shifting to a harder domain rather than by a genuine DR challenge, and this highlights the importance of TD as a complementary metric. We hope our study will shed light on the current DR state of NLP models and promote improved evaluation practices toward more robust models.

LGJun 13, 2018
Deep Learning based Estimation of Weaving Target Maneuvers

Vitaly Shalumov, Itzik Klein

In target tracking, the estimation of an unknown weaving target frequency is crucial for improving the miss distance. The estimation process is commonly carried out in a Kalman framework. The objective of this paper is to examine the potential of using neural networks in target tracking applications. To that end, we propose estimating the weaving frequency using deep neural networks, instead of classical Kalman framework based estimation. Particularly, we focus on the case where a set of possible constant target frequencies is known. Several neural network architectures, requiring low computational resources were designed to estimate the unknown frequency out of the known set of frequencies. The proposed approach performance is compared with the multiple model adaptive estimation algorithm. Simulation results show that in the examined scenarios, deep neural network outperforms multiple model adaptive estimation in terms of accuracy and the amount of required measurements to convergence.