Rui Portocarrero Sarmento

IR
6papers
69citations
Novelty21%
AI Score17

6 Papers

IRMay 30, 2022
Contextualization for the Organization of Text Documents Streams

Rui Portocarrero Sarmento, Douglas O. Cardoso, João Gama et al.

There has been a significant effort by the research community to address the problem of providing methods to organize documentation with the help of information Retrieval methods. In this report paper, we present several experiments with some stream analysis methods to explore streams of text documents. We use only dynamic algorithms to explore, analyze, and organize the flux of text documents. This document shows a case study with developed architectures of a Text Document Stream Organization, using incremental algorithms like Incremental TextRank, and IS-TFIDF. Both these algorithms are based on the assumption that the mapping of text documents and their document-term matrix in lower-dimensional evolving networks provides faster processing when compared to batch algorithms. With this architecture, and by using FastText Embedding to retrieve similarity between documents, we compare methods with large text datasets and ground truth evaluation of clustering capacities. The datasets used were Reuters and COVID-19 emotions. The results provide a new view for the contextualization of similarity when approaching flux of documents organization tasks, based on the similarity between documents in the flux, and by using mentioned algorithms.

APMay 8, 2019
Confirmatory Factor Analysis -- A Case study

Rui Portocarrero Sarmento, Vera Costa

Confirmatory Factor Analysis (CFA) is a particular form of factor analysis, most commonly used in social research. In confirmatory factor analysis, the researcher first develops a hypothesis about what factors they believe are underlying the used measures and may impose constraints on the model based on these a priori hypotheses. For example, if two factors are accounting for the covariance in the measures, and these factors are unrelated to one another, we can create a model where the correlation between factor X and factor Y is set to zero. Measures could then be obtained to assess how well the fitted model captured the covariance between all the items or measures in the model. Thus, if the results of statistical tests of the model fit indicate a poor fit, the model will be rejected. If the fit is weak, it may be due to a variety of reasons. We propose to introduce state of the art techniques to do CFA in R language. Then, we propose to do some examples of CFA with R and some datasets, revealing several scenarios where CFA is relevant.

SIApr 7, 2019
Density-based Community Detection/Optimization

Rui Portocarrero Sarmento

Modularity-based algorithms used for community detection have been increasing in recent years. Modularity and its application have been generating controversy since some authors argue it is not a metric without disadvantages. It has been shown that algorithms that use modularity to detect communities suffer a resolution limit and, therefore, it is unable to identify small communities in some situations. In this work, we try to apply a density optimization of communities found by the label propagation algorithm and study what happens regarding modularity of optimized results. We introduce a metric we call ADC (Average Density per Community); we use this metric to prove our optimization provides improvements to the community density obtained with benchmark algorithms. Additionally, we provide evidence this optimization might not alter modularity of resulting communities significantly. Additionally, by also using the SSC (Strongly Connected Components) concept we developed a community detection algorithm that we also compare with the label propagation algorithm. These comparisons were executed with several test networks and with different network sizes. The results of the optimization algorithm proved to be interesting. Additionally, the results of the community detection algorithm turned out to be similar to the benchmark algorithm we used.

IRApr 6, 2019
Idealize - A Notion of Idea Strength

Rui Portocarrero Sarmento

Business Entrepreneurs frequently thrive on looking for ways to test business ideas, without giving too much information. Recent techniques in startup development promote the use of surveys to measure the potential client's interest. In this preliminary report, we describe the concept behind Idealize, a Shiny R application to measure the local trend strength of a potential idea. Additionally, the system might provide a relative distance to the capital city of the country. The tests were made for the United States of America, i.e., made available regarding native English language. This report shows some of the tests results with this system.

HCDec 4, 2018
A System for Efficient Communication between Patients and Pharmacies

Rui Portocarrero Sarmento, André Tarrinho, Pedro Câmara et al.

When studying human-technology interaction systems, researchers thrive to achieve intuitiveness and facilitate the people's life through a thoughtful and in-depth study of several components of the application system that supports some particular business communication with customers. Particularly in the healthcare field, some requirements such as clarity, transparency, efficiency, and speed in transmitting information to patients and or healthcare professionals might mean an important increase in the well-being of the patient and productivity of the healthcare professional. In this work, the authors study the difficulties patients frequently have when communicating with pharmacists. In addition to a statistical study of a survey conducted with more than two hundred frequent pharmacy customers, we propose an IT solution for better communication between patients and pharmacists.

IRNov 29, 2018
Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

Rui Portocarrero Sarmento, Pavel Brazdil

In this report, we experimented with several concepts regarding text streams analysis. We tested an implementation of Incremental Sparse TF-IDF (IS-TFIDF) and Incremental Cosine Similarity (ICS) with the use of bipartite graphs. We are using bipartite graphs - one type of node are documents, and the other type of nodes are words - to know what documents are affected with a word arrival at the stream (the neighbors of the word in the graph). Thus, with this information, we leverage optimized algorithms used for graph-based applications. The concept is similar to, for example, the use of hash tables or other computer science concepts used for fast access to information in memory.