Nalin Arachchilage

AI
h-index30
4papers
121citations
Novelty23%
AI Score33

4 Papers

AIFeb 15, 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

Timothy R. McIntosh, Teo Susnjak, Nalin Arachchilage et al.

The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their own LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of benchmark functionality and integrity. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.

51.6SEApr 9
Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux et al.

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

CRNov 1, 2024
Examining Attacks on Consensus and Incentive Systems in Proof-of-Work Blockchains: A Systematic Literature Review

Dinitha Wijewardhana, Sugandima Vidanagamachchi, Nalin Arachchilage

Cryptocurrencies have gained popularity due to their transparency, security, and accessibility compared to traditional financial systems, with Bitcoin, introduced in 2009, leading the market. Bitcoin's security relies on blockchain technology - a decentralized ledger consisting of a consensus and an incentive mechanism. The consensus mechanism, Proof of Work (PoW), requires miners to solve difficult cryptographic puzzles to add new blocks, while the incentive mechanism rewards them with newly minted bitcoins. However, as Bitcoin's acceptance grows, it faces increasing threats from attacks targeting these mechanisms, such as selfish mining, double-spending, and block withholding. These attacks compromise security, efficiency, and reward distribution. Recent research shows that these attacks can be combined with each other or with either malicious strategies, such as network-layer attacks, or non-malicious strategies, like honest mining. These combinations lead to more sophisticated attacks, increasing the attacker's success rates and profitability. Therefore, understanding and evaluating these attacks is essential for developing effective countermeasures and ensuring long-term security. This paper begins by examining individual attacks executed in isolation and their profitability. It then explores how combining these attacks with each other or with other malicious and non-malicious strategies can enhance their overall effectiveness and profitability. The analysis further explores how the deployment of attacks such as selfish mining and block withholding by multiple competing mining pools against each other impacts their economic returns. Lastly, a set of design guidelines is provided, outlining areas future work should focus on to prevent or mitigate the identified threats.

CRFeb 16
When Security Meets Usability: An Empirical Investigation of Post-Quantum Cryptography APIs

Marthin Toruan, R. D. N. Shakya, Samuel Tseitkin et al.

Advances in quantum computing increasingly threaten the security and privacy of data protected by current cryptosystems, particularly those relying on public-key cryptography. In response, the international cybersecurity community has prioritized the implementation of Post-Quantum Cryptography (PQC), a new cryptographic standard designed to resist quantum attacks while operating on classical computers. The National Institute of Standards and Technology (NIST) has already standardized several PQC algorithms and plans to deprecate classical asymmetric schemes, such as RSA and ECDSA, by 2035. Despite this urgency, PQC adoption remains slow, often due to limited developer expertise. Application Programming Interfaces (APIs) are intended to bridge this gap, yet prior research on classical security APIs demonstrates that poor usability of cryptographic APIs can lead developers to introduce vulnerabilities during implementation of the applications, a risk amplified by the novelty and complexity of PQC. To date, the usability of PQC APIs has not been systematically studied. This research presents an empirical evaluation of the usability of the PQC APIs, observing how developers interact with APIs and documentation during software development tasks. The study identifies cognitive factors that influence the developer's performance when working with PQC primitives with minimal onboarding. The findings highlight opportunities across the PQC ecosystem to improve developer-facing guidance, terminology alignment, and workflow examples to better support non-specialists.