Kyller Gorgônio

3papers

3 Papers

1.4LGFeb 12

On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy

Luiz Pereira, Mirko Perkusich, Dalton Valadares et al.

Federated Neuromorphic Learning (FNL) enables energy-efficient and privacy-preserving learning on devices without centralizing data. However, real-world deployments require additional privacy mechanisms that can significantly alter training signals. This paper analyzes how Differential Privacy (DP) mechanisms, specifically gradient clipping and noise injection, perturb firing-rate statistics in Spiking Neural Networks (SNNs) and how these perturbations are propagated to rate-based FNL coordination. On a speech recognition task under non-IID settings, ablations across privacy budgets and clipping bounds reveal systematic rate shifts, attenuated aggregation, and ranking instability during client selection. Moreover, we relate these shifts to sparsity and memory indicators. Our findings provide actionable guidance for privacy-preserving FNL, specifically regarding the balance between privacy strength and rate-dependent coordination.

5.0SEJun 29

Prompting GPT-5 on Scrum Certification Questions: An Empirical Accuracy Study

Mirko Perkusich, Danyllo Albuquerque, João Paiva et al.

Large Language Models (LLMs) are increasingly used in Agile Software Development for documentation, coaching, and training. As practitioners adopt these tools to prepare for certifications such as Professional Scrum Master (PSM), a key question is whether LLMs can reliably reason about Scrum, a framework with normative, well-defined rules described in the Scrum Guide (2020). This paper examines how different prompt techniques affect the factual accuracy of LLM responses to Scrum certification-style questions. A dataset of 993 validated PSM-aligned questions was answered by GPT-5 using three techniques: zero-shot, chain-of-thought, and with-source citation. All prompts achieved certification-level accuracy above 85\%, with the citation-based variant performing best (89.1\%) and yielding the lowest error rate. Correct answers concentrated in well-defined topics, such as \emph{Definition of Done}, Events, and Product Backlog Management, and in single-answer multiple-choice items, while multi-select questions and more interpretive areas, such as Scrum Team and Product Value, were less stable. Among questions where at least one prompt failed (16.2\%), errors clustered into misalignment with the Scrum Guide (28\%), content outside its scope (34\%), and outdated or biased interpretations (38\%). Overall, prompt techniques produced modest but consistent improvements, particularly in reducing misinterpretation and version drift, supporting more reliable use of LLMs in Agile learning and certification preparation.

5.4SEJun 29

Comparing Large Language Models on Scrum Certification-Style Questions: Accuracy, Stability, and Error Patterns

Robson Alves Vilar, Emanuel Dantas Filho, Ademar França de Sousa Neto et al.

Large Language Models (LLMs) are increasingly used in exam- and certification-style question answering tasks, where their ability to retrieve, interpret, and apply domain-specific knowledge can be systematically assessed. In Software Engineering, such settings are particularly relevant when questions depend on strict adherence to normative definitions, roles, artifacts, and rules. This paper evaluates the performance of three contemporary LLMs, \textit{GPT-5 mini}, \textit{Gemini 3 Flash}, and \textit{DeepSeek Chat 3.2}, in answering 993 Scrum certification-style questions aligned with the Professional Scrum Master I (PSM I) assessment format. We evaluated the models under three prompting strategies (\textit{zero-shot}, \textit{chain-of-thought}, and \textit{source-grounded}), with repeated executions to assess intra-model stability. We also analyzed performance across Scrum topics and question formats, complemented by a qualitative analysis of recurring error patterns in incorrect answers. Results revealed clear differences among models, with Gemini 3 Flash achieving the highest accuracy, followed by GPT-5 mini and DeepSeek Chat 3.2, while intra-model variability remained low across all conditions. By question format, the models achieved the highest accuracy on single-answer multiple-choice items, whereas multi-select and True/False questions were more error-prone. By topic, performance was more consistent in normatively explicit areas such as Artifacts, Empiricism, and Product Value, but more fragile in Scrum Values, Self-Managing Teams, and Stakeholders \& Customers. The qualitative analysis showed that errors were systematic rather than random, involving overgeneralization, restrictive wording, compound distractors, and conflicts between common market interpretations and strict Scrum definitions.