Mario A. Nascimento

CL
h-index32
6papers
72citations
Novelty43%
AI Score47

6 Papers

LGMay 22
Assessing Predictive Models for Fairness Based on Movement Patterns

Francesco Lettich, Mario A. Nascimento, Chiara Pugliese et al.

Assessing the spatial fairness of predictive models involves establishing whether they are statistically penalizing (favoring) individuals associated with certain geographical locations. Literature on this topic makes the fundamental assumption that each individual is assigned to a single geographical location (e.g., place of residence). However, fairness with respect to the set of locations where one has been, i.e., their movement patterns over different regions, also matters when fairness is considered. Consequently, we argue that it is necessary to generalize the notion of spatial fairness to also include movement patterns, leading to the novel problem of assessing predictive models for fairness relative to the movements of individuals. To deal with this problem, we propose an approach that first associates the movements of individuals to certain geographic regions, considering multiple spatial partitions with different resolutions and alignments, and then employs a suitable spatial scan statistic to assess whether a predictive model is fair based on movement patterns. In the experimental evaluation, we study the performance of our approach over thousands of synthetic unfair datasets, showing that it is effective at detecting this new type of unfairness and at retrieving the set of objects treated unfairly, while localization performance exhibits a consistent multi-resolution trade-off.

SEOct 28, 2013Code
Mining the Temporal Evolution of the Android Bug Reporting Community via Sliding Windows

Feng Jiang, Jiemin Wang, Abram Hindle et al.

The open source development community consists of both paid and volunteer developers as well as new and experienced users. Previous work has applied social network analysis (SNA) to open source communities and has demonstrated value in expertise discovery and triaging. One problem with applying SNA directly to the data of the entire project lifetime is that the impact of local activities will be drowned out. In this paper we provide a method for aggregating, analyzing, and visualizing local (small time periods) interactions of bug reporting participants by using the SNA to measure the betweeness centrality of these participants. In particular we mined the Android bug repository by producing social networks from overlapping 30-day windows of bug reports, each sliding over by day. In this paper we define three patterns of participant behaviour based on their local centrality. We propose a method of analyzing the centrality of bug report participants both locally and globally, then we conduct a thorough case study of the bug reporter's activity within the Android bug repository. Furthermore, we validate the conclusions of our method by mining the Android version control system and inspecting the Android release history. We found that windowed SNA analysis elicited local behaviour that were invisible during global analysis.

CLFeb 4
Exploiting contextual information to improve stance detection in informal political discourse with LLMs

Arman Engin Sucu, Yixiang Zhou, Mario A. Nascimento et al.

This study investigates the use of Large Language Models (LLMs) for political stance detection in informal online discourse, where language is often sarcastic, ambiguous, and context-dependent. We explore whether providing contextual information, specifically user profile summaries derived from historical posts, can improve classification accuracy. Using a real-world political forum dataset, we generate structured profiles that summarize users' ideological leaning, recurring topics, and linguistic patterns. We evaluate seven state-of-the-art LLMs across baseline and context-enriched setups through a comprehensive cross-model evaluation. Our findings show that contextual prompts significantly boost accuracy, with improvements ranging from +17.5\% to +38.5\%, achieving up to 74\% accuracy that surpasses previous approaches. We also analyze how profile size and post selection strategies affect performance, showing that strategically chosen political content yields better results than larger, randomly selected contexts. These findings underscore the value of incorporating user-level context to enhance LLM performance in nuanced political classification tasks.

CLNov 21, 2024
An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains

Arthur Elwing Torres, Edleno Silva de Moura, Altigran Soares da Silva et al.

Named Entity Recognition (NER) is a machine learning task that traditionally relies on supervised learning and annotated data. Acquiring such data is often a challenge, particularly in specialized fields like medical, legal, and financial sectors. Those are commonly referred to as low-resource domains, which comprise long-tail entities, due to the scarcity of available data. To address this, data augmentation techniques are increasingly being employed to generate additional training instances from the original dataset. In this study, we evaluate the effectiveness of two prominent text augmentation techniques, Mention Replacement and Contextual Word Replacement, on two widely-used NER models, Bi-LSTM+CRF and BERT. We conduct experiments on four datasets from low-resource domains, and we explore the impact of various combinations of training subset sizes and number of augmented examples. We not only confirm that data augmentation is particularly beneficial for smaller datasets, but we also demonstrate that there is no universally optimal number of augmented examples, i.e., NER practitioners must experiment with different quantities in order to fine-tune their projects.

DBSep 25, 2020
Towards A Personal Shopper's Dilemma: Time vs Cost

Samiul Anwar, Francesco Lettich, Mario A. Nascimento

Consider a customer who needs to fulfill a shopping list, and also a personal shopper who is willing to buy and resell to customers the goods in their shopping lists. It is in the personal shopper's best interest to find (shopping) routes that (i) minimize the time serving a customer, in order to be able to serve more customers, and (ii) minimize the price paid for the goods, in order to maximize his/her potential profit when reselling them. Those are typically competing criteria leading to what we refer to as the Personal Shopper's Dilemma query, i.e., to determine where to buy each of the required goods while attempting to optimize both criteria at the same time. Given the query's NP-hardness we propose a heuristic approach to determine a subset of the sub-optimal routes under any linear combination of the aforementioned criteria, i.e., the query's approximate linear skyline set. In order to measure the effectiveness of our approach we also introduce two new metrics, optimality and coverage gaps w.r.t. an optimal, but computationally expensive, baseline solution. Our experiments, using realistic city-scale datasets, show that our proposed approach is two orders of magnitude faster than the baseline and yields low values for the optimality and coverage gaps.

CVMar 31, 2020
UniformAugment: A Search-free Probabilistic Data Augmentation Approach

Tom Ching LingChen, Ava Khonsari, Amirreza Lashkari et al.

Augmenting training datasets has been shown to improve the learning effectiveness for several computer vision tasks. A good augmentation produces an augmented dataset that adds variability while retaining the statistical properties of the original dataset. Some techniques, such as AutoAugment and Fast AutoAugment, have introduced a search phase to find a set of suitable augmentation policies for a given model and dataset. This comes at the cost of great computational overhead, adding up to several thousand GPU hours. More recently RandAugment was proposed to substantially speedup the search phase by approximating the search space by a couple of hyperparameters, but still incurring non-negligible cost for tuning those. In this paper we show that, under the assumption that the augmentation space is approximately distribution invariant, a uniform sampling over the continuous space of augmentation transformations is sufficient to train highly effective models. Based on that result we propose UniformAugment, an automated data augmentation approach that completely avoids a search phase. In addition to discussing the theoretical underpinning supporting our approach, we also use the standard datasets, as well as established models for image classification, to show that UniformAugment's effectiveness is comparable to the aforementioned methods, while still being highly efficient by virtue of not requiring any search.