Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research
For financial professionals and AI developers, this benchmark reveals that current deep research agents are inadequate for professional financial analysis, providing a standardized evaluation tool.
Deep FinResearch Bench evaluates AI agents on financial investment report quality across qualitative rigor, quantitative accuracy, and claim verifiability. AI-generated reports underperform compared to professional analysts, highlighting the need for domain-specialized agents.
We introduce Deep FinResearch Bench, a practical and comprehensive evaluation framework for deep research (DR) agents in financial investment research. The benchmark assesses three dimensions of report quality: qualitative rigor, quantitative forecasting and valuation accuracy, and claim credibility and verifiability. Particularly, we define corresponding qualitative and quantitative evaluation metrics and implement an automated scoring procedure to enable scalable assessment. Applying the benchmark to financial reports from frontier DR agents and comparing them with reports authored by financial professionals, we find that AI-generated reports still fall short across these dimensions. These findings underscore the need for domain-specialized DR agents tailored to finance, and we hope the work establishes a foundation for standardized benchmarking of DR agents in financial research.