Mariano Maisonnave

2papers

2 Papers

IRJul 13, 2020
Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé et al.

This article analyses and evaluates FDD\b{eta}, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD\b{eta} weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. The article makes the following contributions: (1) it presents an extensive analysis of the behavior of FDD\b{eta} as a function of its adjustable parameter; (2) it compares FDD\b{eta} against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it introduces a new public data set with news labeled as relevant or irrelevant to the economic domain. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDD\b{eta} is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDD\b{eta} offers a useful mechanism to explore different approaches to build complex queries.

CLJul 2, 2020
Detecting Ongoing Events Using Contextual Word and Sentence Embeddings

Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé et al.

This paper introduces the Ongoing Event Detection (OED) task, which is a specific Event Detection task where the goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current. Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system. The main contribution of this paper are the following: (1) it introduces the OED task along with a dataset manually labeled for the task; (2) it presents the design and implementation of an RNN model for the task that uses BERT embeddings to define contextual word and contextual sentence embeddings as attributes, which to the best of our knowledge were never used before for detecting ongoing events in news; (3) it presents an extensive empirical evaluation that includes (i) the exploration of different architectures and hyperparameters, (ii) an ablation test to study the impact of each attribute, and (iii) a comparison with a replication of a state-of-the-art model. The results offer several insights into the importance of contextual embeddings and indicate that the proposed approach is effective in the OED task, outperforming the baseline models.