AIOct 15, 2025

A Methodology for Assessing the Risk of Metric Failure in LLMs Within the Financial Domain

William Flanagan, Mukunda Das, Rajitha Ramanayake, Swanuja Maslekar, Meghana Mangipudi, Joong Ho Choi, Shruti Nair, Shambhavi Bhusan, Sanjana Dulam, Mouni Pendharkar, Nidhi Singh, Vashisth Doshi

arXiv:2510.13524v25.81 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of metric failure for financial institutions adopting generative AI, but it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of measuring model performance for generative AI in financial services, where traditional metrics and benchmarks often fail, by proposing a Risk Assessment Framework to improve the application of subject matter expert and machine learning metrics.

As Generative Artificial Intelligence is adopted across the financial services industry, a significant barrier to adoption and usage is measuring model performance. Historical machine learning metrics can oftentimes fail to generalize to GenAI workloads and are often supplemented using Subject Matter Expert (SME) Evaluation. Even in this combination, many projects fail to account for various unique risks present in choosing specific metrics. Additionally, many widespread benchmarks created by foundational research labs and educational institutions fail to generalize to industrial use. This paper explains these challenges and provides a Risk Assessment Framework to allow for better application of SME and machine learning Metrics

View on arXiv PDF

Similar