30.1CRApr 30Code
Characterizing and Modeling the GitHub Security Advisories Review PipelineClaudio Segal, Paulo Segal, Carlos Eduardo Banjar et al.
GitHub Security Advisories (GHSA) have become a central component of open-source vulnerability disclosure and are widely used by developers and security tools. A distinctive feature of GHSA is that only a fraction of advisories are reviewed by GitHub, while the mechanisms associated with this review process remain poorly understood. In this paper, we conduct a large-scale empirical study of the GHSA review processes, analyzing over 288,000 advisories spanning 2019-2025. We characterize which advisories are more likely to be reviewed, quantify review delays, and identify two distinct review-latency regimes: a fast path dominated by GitHub Repository Advisories (GRAs) and a slow path dominated by NVD-first advisories. We further develop a queueing model that accounts for this dichotomy based on the structure of the advisory processing pipeline.
CRAug 3, 2023
Cream Skimming the Underground: Identifying Relevant Information Points from Online ForumsFelipe Moreno-Vera, Mateus Nogueira, Cainã Figueiredo et al.
This paper proposes a machine learning-based approach for detecting the exploitation of vulnerabilities in the wild by monitoring underground hacking forums. The increasing volume of posts discussing exploitation in the wild calls for an automatic approach to process threads and posts that will eventually trigger alarms depending on their content. To illustrate the proposed system, we use the CrimeBB dataset, which contains data scraped from multiple underground forums, and develop a supervised machine learning model that can filter threads citing CVEs and label them as Proof-of-Concept, Weaponization, or Exploitation. Leveraging random forests, we indicate that accuracy, precision and recall above 0.99 are attainable for the classification task. Additionally, we provide insights into the difference in nature between weaponization and exploitation, e.g., interpreting the output of a decision tree, and analyze the profits and other aspects related to the hacking communities. Overall, our work sheds insight into the exploitation of vulnerabilities in the wild and can be used to provide additional ground truth to models such as EPSS and Expected Exploitability.
AIMar 9
Visualizing Coalition Formation: From Hedonic Games to Image SegmentationPedro Henrique de Paula França, Lucas Lopes Felipe, Daniel Sadoc Menasché
We propose image segmentation as a visual diagnostic testbed for coalition formation in hedonic games. Modeling pixels as agents on a graph, we study how a granularization parameter shapes equilibrium fragmentation and boundary structure. On the Weizmann single-object benchmark, we relate multi-coalition equilibria to binary protocols by measuring whether the converged coalitions overlap with a foreground ground-truth. We observe transitions from cohesive to fragmented yet recoverable equilibria, and finally to intrinsic failure under excessive fragmentation. Our core contribution links multi-agent systems with image segmentation by quantifying the impact of mechanism design parameters on equilibrium structures.
SYJun 20, 2024
Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto ScalingS. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson et al.
Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.