Piotr Przymus

SE
3papers
2citations
Novelty35%
AI Score43

3 Papers

SEJan 29Code
Linux Kernel Recency Matters, CVE Severity Doesn't, and History Fades

Piotr Przymus, Witold Weiner, Krzysztof Rykaczewski et al.

In 2024, the Linux kernel became its own Common Vulnerabilities and Exposures (CVE) Numbering Authority (CNA), formalizing how kernel vulnerabilities are identified and tracked. We analyze the anatomy and dynamics of kernel CVEs using metadata, associated commits, and patch latency to understand what drives patching. Results show that severity and Common Vulnerability Scoring System (CVSS) metrics have a negligible association with patch latency, whereas kernel recency is a reasonable predictor in survival models. Kernel developers fix newer kernels sooner, while older ones retain unresolved CVEs. Commits introducing vulnerabilities are typically broader and more complex than their fixes, though often only approximate reconstructions of development history. The Linux kernel remains a unique open-source project -- its CVE process is no exception.

SEJan 26
Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

Piotr Przymus, Andreas Happe, Jürgen Cito

Large Language Model (LLM) - based Automated Program Repair (APR) systems are increasingly integrated into modern software development workflows, offering automated patches in response to natural language bug reports. However, this reliance on untrusted user input introduces a novel and underexplored attack surface. In this paper, we investigate the security risks posed by adversarial bug reports -- realistic-looking issue submissions crafted to mislead APR systems into producing insecure or harmful code changes. We develop a comprehensive threat model and conduct an empirical study to evaluate the vulnerability of APR systems to such attacks. Our demonstration comprises 51 adversarial bug reports generated across a spectrum of strategies, ranging from manual curation to fully automated pipelines. We test these against a leading LLM-based APR system and assess both pre-repair defenses (e.g., LlamaGuard variants, PromptGuard variants, Granite-Guardian, and custom LLM filters) and post-repair detectors (GitHub Copilot, CodeQL). Our findings show that current defenses are insufficient: 90% of crafted bug reports triggered attacker-aligned patches. The best pre-repair filter blocked only 47%, while post-repair analysis -- often requiring human oversight -- was effective in just 58% of cases. To support scalable security testing, we introduce a prototype framework for automating the generation of adversarial bug reports. Our analysis exposes a structural asymmetry: generating adversarial inputs is inexpensive, while detecting or mitigating them remains costly and error-prone. We conclude with recommendations for improving the robustness of APR systems against adversarial misuse and highlight directions for future work on secure APR.

SEMay 4
These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests

Kacper Duma, Patryk Wróblewski, Jagoda Bobińska et al.

We analyze code review interactions for AI-generated pull requests (PRs) on GitHub using the AIDev dataset and compare them to human-authored PRs within the same repositories. We find that most AI-generated PRs receive no review and, when reviewed, are largely dominated by AI agents rather than humans. Human-authored PRs are more likely to receive human-only review and to attract direct human feedback. In contrast, reviews of AI-generated PRs more often take the form of automation-mediated interaction, with human involvement frequently expressed through agent steering rather than standalone evaluation. These results indicate systematic differences in how review activity is structured in agentic workflows and raise challenges for interpreting review metrics as indicators of human oversight in large-scale mining studies.