AIFeb 10
Closing Reasoning Gaps in Clinical Agents with Differential Reasoning LearningJinsong Liu, Yuhang Jiang, Ramayya Krishnan et al.
Clinical decision support requires not only correct answers but also clinically valid reasoning. We propose Differential Reasoning Learning (DRL), a framework that improves clinical agents by learning from reasoning discrepancies. From reference reasoning rationales (e.g., physician-authored clinical rationale, clinical guidelines, or outputs from more capable models) and the agent's free-form chain-of-thought (CoT), DRL extracts reasoning graphs as directed acyclic graphs (DAGs) and performs a clinically weighted graph edit distance (GED)-based discrepancy analysis. An LLM-as-a-judge aligns semantically equivalent nodes and diagnoses discrepancies between graphs. These graph-level discrepancy diagnostics are converted into natural-language instructions and stored in a Differential Reasoning Knowledge Base (DR-KB). At inference, we retrieve top-$k$ instructions via Retrieval-Augmented Generation (RAG) to augment the agent prompt and patch likely logic gaps. Evaluation on open medical question answering (QA) benchmarks and a Return Visit Admissions (RVA) prediction task from internal clinical data demonstrates gains over baselines, improving both final-answer accuracy and reasoning fidelity. Ablation studies confirm gains from infusing reference reasoning rationales and the top-$k$ retrieval strategy. Clinicians' review of the output provides further assurance of the approach. Together, results suggest that DRL supports more reliable clinical decision-making in complex reasoning scenarios and offers a practical mechanism for deployment under limited token budgets.
OCJan 28, 2023
Stochastic Dimension-reduced Second-order Methods for Policy OptimizationJinsong Liu, Chenghan Xie, Qi Deng et al.
In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $\mathcal{O}(ε^{-3.5})$ complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $\mathcal{O}(ε^{-3})$ based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods.
AIOct 17, 2025Code
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning ScaffoldYi Wan, Jiuqi Wang, Liam Li et al.
Tool-augmented large language models (LLMs) are emerging as deep research agents, systems that decompose complex queries, retrieve external evidence, and synthesize grounded responses. Yet current agents remain limited by shallow retrieval, weak alignment metrics, and brittle tool-use behavior. We introduce PokeeResearch-7B, a 7B-parameter deep research agent built under a unified reinforcement learning framework for robustness, alignment, and scalability. PokeeResearch-7B is trained by an annotation-free Reinforcement Learning from AI Feedback (RLAIF) framework to optimize policies using LLM-based reward signals that capture factual accuracy, citation faithfulness, and instruction adherence. A chain-of-thought-driven multi-call reasoning scaffold further enhances robustness through self-verification and adaptive recovery from tool failures. Among 10 popular deep research benchmarks, PokeeResearch-7B achieves state-of-the-art performance among 7B-scale deep research agents. This highlights that careful reinforcement learning and reasoning design can produce efficient, resilient, and research-grade AI agents. The model and inference code is open-sourced under Apache 2.0 license at https://github.com/Pokee-AI/PokeeResearchOSS.
LGFeb 7, 2025
A Deep Learning Framework Integrating CNN and BiLSTM for Financial Systemic Risk Analysis and PredictionYu Cheng, Zhen Xu, Yuan Chen et al.
This study proposes a deep learning model based on the combination of convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) for discriminant analysis of financial systemic risk. The model first uses CNN to extract local patterns of multidimensional features of financial markets, and then models the bidirectional dependency of time series through BiLSTM, to comprehensively characterize the changing laws of systemic risk in spatial features and temporal dynamics. The experiment is based on real financial data sets. The results show that the model is significantly superior to traditional single models (such as BiLSTM, CNN, Transformer, and TCN) in terms of accuracy, recall, and F1 score. The F1-score reaches 0.88, showing extremely high discriminant ability. This shows that the joint strategy of combining CNN and BiLSTM can not only fully capture the complex patterns of market data but also effectively deal with the long-term dependency problem in time series data. In addition, this study also explores the robustness of the model in dealing with data noise and processing high-dimensional data, providing strong support for intelligent financial risk management. In the future, the research will further optimize the model structure, introduce methods such as reinforcement learning and multimodal data analysis, and improve the efficiency and generalization ability of the model to cope with a more complex financial environment.
RMDec 24, 2024
Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based ApplicationsYuhan Wang, Zhen Xu, Yue Yao et al.
With the development of the financial industry, credit default prediction, as an important task in financial risk management, has received increasing attention. Traditional credit default prediction methods mostly rely on machine learning models, such as decision trees and random forests, but these methods have certain limitations in processing complex data and capturing potential risk patterns. To this end, this paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The model combines the advantages of CNN in local feature extraction with the ability of Transformer in global dependency modeling, effectively improving the accuracy and robustness of credit default prediction. Through experiments on public credit default datasets, the results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost, in multiple evaluation indicators such as accuracy, AUC, and KS value, demonstrating its powerful ability in complex financial data modeling. Further experimental analysis shows that appropriate optimizer selection and learning rate adjustment play a vital role in improving model performance. In addition, the ablation experiment of the model verifies the advantages of the combination of CNN and Transformer and proves the complementarity of the two in credit default prediction. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field. Future research can further improve the prediction effect and generalization ability by introducing more unstructured data and improving the model architecture.
CVFeb 28, 2024
Rapid hyperspectral photothermal mid-infrared spectroscopic imaging from sparse data for gynecologic cancer tissue subtypingReza Reihanisaransari, Chalapathi Charan Gajjela, Xinyu Wu et al.
Ovarian cancer detection has traditionally relied on a multi-step process that includes biopsy, tissue staining, and morphological analysis by experienced pathologists. While widely practiced, this conventional approach suffers from several drawbacks: it is qualitative, time-intensive, and heavily dependent on the quality of staining. Mid-infrared (MIR) hyperspectral photothermal imaging is a label-free, biochemically quantitative technology that, when combined with machine learning algorithms, can eliminate the need for staining and provide quantitative results comparable to traditional histology. However, this technology is slow. This work presents a novel approach to MIR photothermal imaging that enhances its speed by an order of magnitude. Our method significantly accelerates data collection by capturing a combination of high-resolution and interleaved, lower-resolution infrared band images and applying computational techniques for data interpolation. We effectively minimize data collection requirements by leveraging sparse data acquisition and employing curvelet-based reconstruction algorithms. This method enables the reconstruction of high-quality, high-resolution images from undersampled datasets and achieving a 10X improvement in data acquisition time. We assessed the performance of our sparse imaging methodology using a variety of quantitative metrics, including mean squared error (MSE), structural similarity index (SSIM), and tissue subtype classification accuracies, employing both random forest and convolutional neural network (CNN) models, accompanied by ROC curves. Our statistically robust analysis, based on data from 100 ovarian cancer patient samples and over 65 million data points, demonstrates the method's capability to produce superior image quality and accurately distinguish between different gynecological tissue types with segmentation accuracy exceeding 95%.
AIJun 18, 2025
Deep Reinforcement Learning Xiangqi Player with Monte Carlo Tree SearchBerk Yilmaz, Junyu Hu, Jinsong Liu
This paper presents a Deep Reinforcement Learning (DRL) system for Xiangqi (Chinese Chess) that integrates neural networks with Monte Carlo Tree Search (MCTS) to enable strategic self-play and self-improvement. Addressing the underexplored complexity of Xiangqi, including its unique board layout, piece movement constraints, and victory conditions, our approach combines policy-value networks with MCTS to simulate move consequences and refine decision-making. By overcoming challenges such as Xiangqi's high branching factor and asymmetrical piece dynamics, our work advances AI capabilities in culturally significant strategy games while providing insights for adapting DRL-MCTS frameworks to domain-specific rule systems.
CVMar 21, 2018
Patch-based Fake Fingerprint Detection Using a Fully Convolutional Neural Network with a Small Number of Parameters and an Optimal ThresholdEunsoo Park, Xuenan Cui, Weonjin Kim et al.
Fingerprint authentication is widely used in biometrics due to its simple process, but it is vulnerable to fake fingerprints. This study proposes a patch-based fake fingerprint detection method using a fully convolutional neural network with a small number of parameters and an optimal threshold to solve the above-mentioned problem. Unlike the existing methods that classify a fingerprint as live or fake, the proposed method classifies fingerprints as fake, live, or background, so preprocessing methods such as segmentation are not needed. The proposed convolutional neural network (CNN) structure applies the Fire module of SqueezeNet, and the fewer parameters used require only 2.0 MB of memory. The network that has completed training is applied to the training data in a fully convolutional way, and the optimal threshold to distinguish fake fingerprints is determined, which is used in the final test. As a result of this study experiment, the proposed method showed an average classification error of 1.35%, demonstrating a fake fingerprint detection method using a high-performance CNN with a small number of parameters.