CRPLOct 30, 2021

Trojan Source: Invisible Vulnerabilities

arXiv:2111.00169v225 citationsHas Code
Originality Highly original
AI Analysis

This poses an immediate threat to software security across the industry, including first-party and supply-chain systems, by enabling undetectable code manipulation.

The authors introduced 'Trojan Source' attacks, which exploit Unicode encoding to make source code appear differently to compilers than to human reviewers, creating invisible vulnerabilities; they demonstrated working examples in multiple programming languages and proposed compiler-level defenses.

We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers. 'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and of supply-chain compromise across the industry. We present working examples of Trojan Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack. We document an industry-wide coordinated disclosure for these vulnerabilities; as they affect most compilers, editors, and repositories, the exercise teaches how different firms, open-source communities, and other stakeholders respond to vulnerability disclosure.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes