AIDec 1, 2025

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

arXiv:2512.02261v14 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This addresses a critical safety gap for financial institutions deploying autonomous trading agents in high-risk markets, though it is incremental as it focuses on evaluation rather than new agent designs.

The paper tackles the problem of evaluating the reliability of LLM-based trading agents under adversarial conditions, showing that small perturbations can cause extreme portfolio concentration, runaway exposure, and large drawdowns in real US equity market backtests.

LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes