4.4SEApr 2
YASA: Scalable Multi-Language Taint Analysis on the Unified AST at Ant GroupYayi Wang, Shenao Wang, Jian Zhao et al.
Modern enterprises increasingly adopt diverse technology stacks with various programming languages, posing significant challenges for static application security testing (SAST). Existing taint analysis tools are predominantly designed for single languages, requiring substantial engineering effort that scales with language diversity. While multi-language tools like CodeQL, Joern, and WALA attempt to address these challenges, they face limitations in intermediate representation design, analysis precision, and extensibility, which make them difficult to scale effectively for large-scale industrial applications at Ant Group. To bridge this gap, we present YASA (Yet Another Static Analyzer), a unified multi-language static taint analysis framework designed for industrial-scale deployment. Specifically, YASA introduces the Unified Abstract Syntax Tree (UAST) that provides a unified abstraction for compatibility across diverse programming languages. Building on the UAST, YASA performs point-to analysis and taint propagation, leveraging a unified semantic model to manage language-agnostic constructs, while incorporating language-specific semantic models to handle other unique language features. When compared to 6 single- and 2 multi-language static analyzers on an industry-standard benchmark, YASA consistently outperformed all baselines across Java, JavaScript, Python, and Go. In real-world deployment within Ant Group, YASA analyzed over 100 million lines of code across 7.3K internal applications. It identified 314 previously unknown taint paths, with 92 of them confirmed as 0-day vulnerabilities. All vulnerabilities were responsibly reported, with 76 already patched by internal development teams, demonstrating YASA's practical effectiveness for securing large-scale industrial software systems.
16.5CRMar 28
"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous ArtifactsShenao Wang, Junjie He, Yanjie Zhao et al.
Skills are increasingly used to extend LLM agents by packaging prompts, code, and configurations into reusable modules. As public registries and marketplaces expand, they form an emerging agentic supply chain, but also introduce a new attack surface for malicious skills. Detecting malicious skills is challenging because relevant evidence is often distributed across heterogeneous artifacts and must be reasoned in context. Existing static, LLM-based, and dynamic approaches each capture only part of this problem, making them insufficient for robust real-world detection. In this paper, we present MalSkills, a neuro-symbolic framework for malicious skills detection. MalSkills first extracts security-sensitive operations from heterogeneous artifacts through a combination of symbolic parsing and LLM-assisted semantic analysis. It then constructs the skill dependency graph that links artifacts, operations, operands, and value flows across the skill. On top of this graph, MalSkills performs neuro-symbolic reasoning to infer malicious patterns or previously unseen suspicious workflows. We evaluate MalSkills on a benchmark of 200 real-world skills against 5 state-of-the-art baselines. MalSkills achieves 93% F1, outperforming the baselines by 5~87 percentage points. We further apply MalSkills to analyze 150,108 skills collected from 7 public registries, revealing 620 malicious skills. As for now, we have finished reviewing 100 of them and identified 76 previously unknown malicious skills, all of which were responsibly reported and are currently awaiting confirmation from the platforms and maintainers. These results demonstrate the potential of MalSkills in securing the agentic supply chain.