Jiexi Xu

AI
h-index5
7papers
21citations
Novelty52%
AI Score47

7 Papers

CVOct 1, 2025
Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

Ran Tong, Jiaqi Liu, Tong Wang et al.

The accurate interpretation of chest radiographs using automated methods is a critical task in medical imaging. This paper presents a comparative analysis between a supervised lightweight Convolutional Neural Network (CNN) and a state-of-the-art, zero-shot medical Vision-Language Model (VLM), BiomedCLIP, across two distinct diagnostic tasks: pneumonia detection on the PneumoniaMNIST benchmark and tuberculosis detection on the Shenzhen TB dataset. Our experiments show that supervised CNNs serve as highly competitive baselines in both cases. While the default zero-shot performance of the VLM is lower, we demonstrate that its potential can be unlocked via a simple yet crucial remedy: decision threshold calibration. By optimizing the classification threshold on a validation set, the performance of BiomedCLIP is significantly boosted across both datasets. For pneumonia detection, calibration enables the zero-shot VLM to achieve a superior F1-score of 0.8841, surpassing the supervised CNN's 0.8803. For tuberculosis detection, calibration dramatically improves the F1-score from 0.4812 to 0.7684, bringing it close to the supervised baseline's 0.7834. This work highlights a key insight: proper calibration is essential for leveraging the full diagnostic power of zero-shot VLMs, enabling them to match or even outperform efficient, task-specific supervised models.

CLOct 11, 2025
Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default

Jiaqi Liu, Tong Wang, Su Liu et al.

The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.

AISep 29, 2025
Toward Causal-Visual Programming: Enhancing Agentic Reasoning in Low-Code Environments

Jiexi Xu, Jiaqi Liu, Lanruo Wang et al.

Large language model (LLM) agents are increasingly capable of orchestrating complex tasks in low-code environments. However, these agents often exhibit hallucinations and logical inconsistencies because their inherent reasoning mechanisms rely on probabilistic associations rather than genuine causal understanding. This paper introduces a new programming paradigm: Causal-Visual Programming (CVP), designed to address this fundamental issue by explicitly introducing causal structures into the workflow design. CVP allows users to define a simple "world model" for workflow modules through an intuitive low-code interface, effectively creating a Directed Acyclic Graph (DAG) that explicitly defines the causal relationships between modules. This causal graph acts as a crucial constraint during the agent's reasoning process, anchoring its decisions to a user-defined causal structure and significantly reducing logical errors and hallucinations by preventing reliance on spurious correlations. To validate the effectiveness of CVP, we designed a synthetic experiment that simulates a common real-world problem: a distribution shift between the training and test environments. Our results show that a causally anchored model maintained stable accuracy in the face of this shift, whereas a purely associative baseline model that relied on probabilistic correlations experienced a significant performance drop. The primary contributions of this study are: a formal definition of causal structures for workflow modules; the proposal and implementation of a CVP framework that anchors agent reasoning to a user-defined causal graph; and empirical evidence demonstrating the framework's effectiveness in enhancing agent robustness and reducing errors caused by causal confusion in dynamic environments. CVP offers a viable path toward building more interpretable, reliable, and trustworthy AI agents.

LGOct 25, 2025
Dynamic Graph Neural Network for Data-Driven Physiologically Based Pharmacokinetic Modeling

Su Liu, Xin Hu, Shurong Wen et al.

Physiologically Based Pharmacokinetic (PBPK) modeling plays a critical role in drug development by predicting drug concentration dynamics across organs. Traditional approaches rely on ordinary differential equations with strong simplifying assumptions, which limit their adaptability to nonlinear physiological interactions. In this study, we explore data-driven alternatives for PBPK prediction using deep learning. Two baseline architectures - a multilayer perceptron (MLP) and a long short-term memory (LSTM) network - are implemented to capture molecular and temporal dependencies, respectively. To incorporate inter-organ interactions, we propose a Dynamic Graph Neural Network (Dynamic GNN) that models physiological connections as recurrent message-passing processes between organs. Experimental results demonstrate that the proposed Dynamic GNN achieves the highest predictive performance among all models, with an R^2 of 0.9342, an RMSE of 0.0159, and an MAE of 0.0116. In comparison, the MLP baseline obtains an R^2 of 0.8705 and the LSTM achieves 0.8059. These results highlight that explicitly modeling the spatial and temporal dependencies of organ interactions enables more accurate and generalizable drug concentration prediction. The Dynamic GNN provides a scalable, equation-free alternative to traditional PBPK formulations and demonstrates strong potential for data-driven pharmacokinetic modeling in preclinical and clinical research.

LGSep 28, 2025
Dynamic Policy Induction for Adaptive Prompt Optimization: Bridging the Efficiency-Accuracy Gap via Lightweight Reinforcement Learning

Jiexi Xu

The performance of Large Language Models (LLMs) depends heavily on the chosen prompting strategy, yet static approaches such as Zero-Shot, Few-Shot, or Chain-of-Thought (CoT) impose a rigid efficiency-accuracy trade-off. Highly accurate strategies like Self-Consistency (SC) incur substantial computational waste on simple tasks, while lightweight methods often fail on complex inputs. This paper introduces the Prompt Policy Network (PPN), a lightweight reinforcement learning framework that formalizes adaptive strategy selection as a single-step Markov Decision Process (MDP). The PPN, trained with Proximal Policy Optimization (PPO) and guided by a resource-explicit reward function, learns to allocate costly reasoning strategies only when necessary. Experiments on arithmetic reasoning benchmarks demonstrate that PPN achieves superior performance on the efficiency-accuracy Pareto front, delivering up to 61.5% token cost reduction compared to Self-Consistency while maintaining competitive accuracy. This work contributes a systematic, adaptive framework for cost-efficient LLM deployment, advancing the design of lightweight optimization techniques for scalable and sustainable language model applications.

AISep 27, 2025
Memory Management and Contextual Consistency for Long-Running Low-Code Agents

Jiexi Xu

The rise of AI-native Low-Code/No-Code (LCNC) platforms enables autonomous agents capable of executing complex, long-duration business processes. However, a fundamental challenge remains: memory management. As agents operate over extended periods, they face "memory inflation" and "contextual degradation" issues, leading to inconsistent behavior, error accumulation, and increased computational cost. This paper proposes a novel hybrid memory system designed specifically for LCNC agents. Inspired by cognitive science, our architecture combines episodic and semantic memory components with a proactive "Intelligent Decay" mechanism. This mechanism intelligently prunes or consolidates memories based on a composite score factoring in recency, relevance, and user-specified utility. A key innovation is a user-centric visualization interface, aligned with the LCNC paradigm, which allows non-technical users to manage the agent's memory directly, for instance, by visually tagging which facts should be retained or forgotten. Through simulated long-running task experiments, we demonstrate that our system significantly outperforms traditional approaches like sliding windows and basic RAG, yielding superior task completion rates, contextual consistency, and long-term token cost efficiency. Our findings establish a new framework for building reliable, transparent AI agents capable of effective long-term learning and adaptation.

AISep 24, 2025
Agentic Metacognition: Designing a "Self-Aware" Low-Code Agent for Failure Prediction and Human Handoff

Jiexi Xu

The inherent non-deterministic nature of autonomous agents, particularly within low-code/no-code (LCNC) environments, presents significant reliability challenges. Agents can become trapped in unforeseen loops, generate inaccurate outputs, or encounter unrecoverable failures, leading to user frustration and a breakdown of trust. This report proposes a novel architectural pattern to address these issues: the integration of a secondary, "metacognitive" layer that actively monitors the primary LCNC agent. Inspired by human introspection, this layer is designed to predict impending task failures based on a defined set of triggers, such as excessive latency or repetitive actions. Upon predicting a failure, the metacognitive agent proactively initiates a human handoff, providing the user with a clear summary of the agent's "thought process" and a detailed explanation of why it could not proceed. An empirical analysis of a prototype system demonstrates that this approach significantly increases the overall task success rate. However, this performance gain comes with a notable increase in computational overhead. The findings reframe human handoffs not as an admission of defeat but as a core design feature that enhances system resilience, improves user experience, and builds trust by providing transparency into the agent's internal state. The report discusses the practical and ethical implications of this approach and identifies key directions for future research.