Akhil Vyas

CR
h-index10
3papers
75citations
Novelty38%
AI Score28

3 Papers

CRMay 21, 2025
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Andy K. Zhang, Joey Ji, Celeste Menders et al.

AI agents have the potential to significantly alter the cybersecurity landscape. Here, we introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems. Instantiating this framework with BountyBench, we set up 25 systems with complex, real-world codebases. To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability). For Detect, we construct a new success indicator, which is general across vulnerability types and provides localized evaluation. We manually set up the environment for each system, including installing packages, setting up server(s), and hydrating database(s). We add 40 bug bounties, which are vulnerabilities with monetary awards of \$10-\$30,485, covering 9 of the OWASP Top 10 Risks. To modulate task difficulty, we devise a new strategy based on information to guide detection, interpolating from identifying a zero day to exploiting a specific vulnerability. We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1. Given up to three attempts, the top-performing agents are OpenAI Codex CLI: o3-high (12.5% on Detect, mapping to \$3,720; 90% on Patch, mapping to \$14,152), Custom Agent with Claude 3.7 Sonnet Thinking (67.5% on Exploit), and OpenAI Codex CLI: o4-mini (90% on Patch, mapping to \$14,422). OpenAI Codex CLI: o3-high, OpenAI Codex CLI: o4-mini, and Claude Code are more capable at defense, achieving higher Patch scores of 90%, 90%, and 87.5%, compared to Exploit scores of 47.5%, 32.5%, and 57.5% respectively; while the custom agents are relatively balanced between offense and defense, achieving Exploit scores of 37.5-67.5% and Patch scores of 35-60%.

CVDec 5, 2024
Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer

Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado

Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential for further improvement through fine-tuning of model parameters. To encourage open research, we also provide the model implementation, enabling reproduction of our results and facilitating further research in this domain.

HCNov 29, 2018
Security, Privacy and Safety Risk Assessment for Virtual Reality Learning Environment Applications

Aniket Gulhane, Akhil Vyas, Reshmi Mitra et al.

Social Virtual Reality based Learning Environments (VRLEs) such as vSocial render instructional content in a three-dimensional immersive computer experience for training youth with learning impediments. There are limited prior works that explored attack vulnerability in VR technology, and hence there is a need for systematic frameworks to quantify risks corresponding to security, privacy, and safety (SPS) threats. The SPS threats can adversely impact the educational user experience and hinder delivery of VRLE content. In this paper, we propose a novel risk assessment framework that utilizes attack trees to calculate a risk score for varied VRLE threats with rate and duration of threats as inputs. We compare the impact of a well-constructed attack tree with an adhoc attack tree to study the trade-offs between overheads in managing attack trees, and the cost of risk mitigation when vulnerabilities are identified. We use a vSocial VRLE testbed in a case study to showcase the effectiveness of our framework and demonstrate how a suitable attack tree formalism can result in a more safer, privacy-preserving and secure VRLE system.