Kaiwen He

AI
h-index16
6papers
355citations
Novelty50%
AI Score45

6 Papers

AIDec 9, 2025
Reasoning Models Ace the CFA Exams

Jaisal Patel, Yunzhe Chen, Kaiwen He et al.

Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall performance, are Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1. Specifically, Gemini 3.0 Pro achieves a record score of 97.6% on Level I. Performance is also strong on Level II, led by GPT-5 at 94.3%. On Level III, Gemini 2.5 Pro attains the highest score with 86.4% on multiple-choice questions while Gemini 3.0 Pro achieves 92.0% on constructed-response questions.

LGOct 14, 2025
Structure-Aware Spectral Sparsification via Uniform Edge Sampling

Kaiwen He, Petros Drineas, Rajiv Khanna

Spectral clustering is a fundamental method for graph partitioning, but its reliance on eigenvector computation limits scalability to massive graphs. Classical sparsification methods preserve spectral properties by sampling edges proportionally to their effective resistances, but require expensive preprocessing to estimate these resistances. We study whether uniform edge sampling-a simple, structure-agnostic strategy-can suffice for spectral clustering. Our main result shows that for graphs admitting a well-separated $k$-clustering, characterized by a large structure ratio $Υ(k) = λ_{k+1} / ρ_G(k)$, uniform sampling preserves the spectral subspace used for clustering. Specifically, we prove that uniformly sampling $O(γ^2 n \log n / ε^2)$ edges, where $γ$ is the Laplacian condition number, yields a sparsifier whose top $(n-k)$-dimensional eigenspace is approximately orthogonal to the cluster indicators. This ensures that the spectral embedding remains faithful, and clustering quality is preserved. Our analysis introduces new resistance bounds for intra-cluster edges, a rank-$(n-k)$ effective resistance formulation, and a matrix Chernoff bound adapted to the dominant eigenspace. These tools allow us to bypass importance sampling entirely. Conceptually, our result connects recent coreset-based clustering theory to spectral sparsification, showing that under strong clusterability, even uniform sampling is structure-aware. This provides the first provable guarantee that uniform edge sampling suffices for structure-preserving spectral clustering.

AISep 25, 2025
Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

Kaiwen He, Zhiwei Wang, Chenyi Zhuang et al.

Recent years, multimodal models have made remarkable strides and pave the way for intelligent browser use agents. However, when solving tasks on real world webpages in multi-turn, long-horizon trajectories, current agents still suffer from disordered action sequencing and excessive trial and error during execution. This paper introduces Recon-Act, a self-evolving multi-agent framework grounded in Reconnaissance-Action behavioral paradigm. The system comprises a Reconnaissance Team and an Action Team: the former conducts comparative analysis and tool generation, while the latter handles intent decomposition, tool orchestration, and execution. By contrasting the erroneous trajectories with successful ones, the Reconnaissance Team infers remedies, and abstracts them into a unified notion of generalized tools, either expressed as hints or as rule-based codes, and register to the tool archive in real time. The Action Team reinference the process empowered with these targeting tools, thus establishing a closed-loop training pipeline of data-tools-action-feedback. Following the 6 level implementation roadmap proposed in this work, we have currently reached Level 3 (with limited human-in-the-loop intervention). Leveraging generalized tools obtained through reconnaissance, Recon-Act substantially improves adaptability to unseen websites and solvability on long-horizon tasks, and achieves state-of-the-art performance on the challenging VisualWebArena dataset.

CLMay 27, 2025
FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information

Yan Wang, Yang Ren, Lingfei Qian et al.

Accurately understanding numbers from financial reports is fundamental to how markets, regulators, algorithms, and normal people read the economy and the world, yet even with XBRL (eXtensible Business Reporting Language) designed to tag every figure with standardized accounting concepts, mapping thousands of facts to over 10,000 U.S. GAAP concepts remains costly, inconsistent, and error-prone. Existing benchmarks define tagging as flat, single-step, extreme classification over small subsets of US-GAAP concepts, overlooking both the taxonomy's hierarchical semantics and the structured nature of real tagging, where each fact must be represented as a contextualized multi-field output. These simplifications prevent fair evaluation of large language models (LLMs) under realistic reporting conditions. To address these gaps, we introduce FinTagging, the first comprehensive benchmark for structure-aware and full-scope XBRL tagging, designed to evaluate LLMs' ability to extract and align financial facts through numerical reasoning and taxonomy alignment across text and tables. We define two subtasks: FinNI for numeric identification, which extracts numerical entities and their types from XBRL reports, and FinCL for concept linking, which maps each extracted entity to the corresponding concept in the full US-GAAP taxonomy. Together, these subtasks produce a structured representation of each financial fact. We evaluate diverse LLMs under zero-shot settings and analyze their performance across both subtasks and overall tagging accuracy. Results show that LLMs generalize well in numeric identification but struggle with fine-grained concept linking, revealing current limitations in structure-aware reasoning for accurate financial disclosure. All code and datasets are available on GitHub and Hugging Face.

SIMar 9, 2020
DeepCP: Deep Learning Driven Cascade Prediction Based Autonomous Content Placement in Closed Social Network

Qiong Wu, Muhong Wu, Xu Chen et al.

Online social networks (OSNs) are emerging as the most popular mainstream platform for content cascade diffusion. In order to provide satisfactory quality of experience (QoE) for users in OSNs, much research dedicates to proactive content placement by using the propagation pattern, user's personal profiles and social relationships in open social network scenarios (e.g., Twitter and Weibo). In this paper, we take a new direction of popularity-aware content placement in a closed social network (e.g., WeChat Moment) where user's privacy is highly enhanced. We propose a novel data-driven holistic deep learning framework, namely DeepCP, for joint diffusion-aware cascade prediction and autonomous content placement without utilizing users' personal and social information. We first devise a time-window LSTM model for content popularity prediction and cascade geo-distribution estimation. Accordingly, we further propose a novel autonomous content placement mechanism CP-GAN which adopts the generative adversarial network (GAN) for agile placement decision making to reduce the content access latency and enhance users' QoE. We conduct extensive experiments using cascade diffusion traces in WeChat Moment (WM). Evaluation results corroborate that the proposed DeepCP framework can predict the content popularity with a high accuracy, generate efficient placement decision in a real-time manner, and achieve significant content access latency reduction over existing schemes.

NIFeb 25, 2020
Personalized Federated Learning for Intelligent IoT Applications: A Cloud-Edge based Framework

Qiong Wu, Kaiwen He, Xu Chen

Internet of Things (IoT) have widely penetrated in different aspects of modern life and many intelligent IoT services and applications are emerging. Recently, federated learning is proposed to train a globally shared model by exploiting a massive amount of user-generated data samples on IoT devices while preventing data leakage. However, the device, statistical and model heterogeneities inherent in the complex IoT environments pose great challenges to traditional federated learning, making it unsuitable to be directly deployed. In this article we advocate a personalized federated learning framework in a cloud-edge architecture for intelligent IoT applications. To cope with the heterogeneity issues in IoT environments, we investigate emerging personalized federated learning methods which are able to mitigate the negative effects caused by heterogeneity in different aspects. With the power of edge computing, the requirements for fast-processing capacity and low latency in intelligent IoT applications can also be achieved. We finally provide a case study of IoT based human activity recognition to demonstrate the effectiveness of personalized federated learning for intelligent IoT applications.