Xinbo Liu

CLAug 29, 2025

Evaluating Large Language Models for Financial Reasoning: A CFA-Based Benchmark Study

Xuan Yao, Qianteng Wang, Xinbo Liu et al.

The rapid advancement of large language models presents significant opportunities for financial applications, yet systematic evaluation in specialized financial contexts remains limited. This study presents the first comprehensive evaluation of state-of-the-art LLMs using 1,560 multiple-choice questions from official mock exams across Levels I-III of CFA, most rigorous professional certifications globally that mirror real-world financial analysis complexity. We compare models distinguished by core design priorities: multi-modal and computationally powerful, reasoning-specialized and highly accurate, and lightweight efficiency-optimized. We assess models under zero-shot prompting and through a novel Retrieval-Augmented Generation pipeline that integrates official CFA curriculum content. The RAG system achieves precise domain-specific knowledge retrieval through hierarchical knowledge organization and structured query generation, significantly enhancing reasoning accuracy in professional financial certification evaluation. Results reveal that reasoning-oriented models consistently outperform others in zero-shot settings, while the RAG pipeline provides substantial improvements particularly for complex scenarios. Comprehensive error analysis identifies knowledge gaps as the primary failure mode, with minimal impact from text readability. These findings provide actionable insights for LLM deployment in finance, offering practitioners evidence-based guidance for model selection and cost-performance optimization.

CRAug 5, 2018

ATMPA: Attacking Machine Learning-based Malware Visualization Detection Methods via Adversarial Examples

Xinbo Liu, Jiliang Zhang, Yaping Lin et al.

Since the threat of malicious software (malware) has become increasingly serious, automatic malware detection techniques have received increasing attention, where machine learning (ML)-based visualization detection methods become more and more popular. In this paper, we demonstrate that the state-of-the-art ML-based visualization detection methods are vulnerable to Adversarial Example (AE) attacks. We develop a novel Adversarial Texture Malware Perturbation Attack (ATMPA) method based on the gradient descent and L-norm optimization method, where attackers can introduce some tiny perturbations on the transformed dataset such that ML-based malware detection methods will completely fail. The experimental results on the MS BIG malware dataset show that a small interference can reduce the accuracy rate down to 0% for several ML-based detection methods, and the rate of transferability is 74.1% on average.

Xinbo Liu

2 Papers