Jiashuo Zhang

LG
h-index20
9papers
28citations
Novelty44%
AI Score53

9 Papers

SEJun 1
When Large Language Models Meet UAV Projects: An Empirical Study from Developers' Perspective

Yihua Chen, Xingle Que, Jiashuo Zhang et al.

In recent years, unmanned aerial vehicles (UAVs) have become increasingly popular in our daily lives and have attracted significant research interest in software engineering. At the same time, large language models (LLMs) have made notable advancements in language understanding, reasoning, and generation, making LLM applications in UAVs a promising research direction. However, existing studies have largely remained in preliminary exploration with a limited understanding of real-world practice, which causes an academia-industry gap and hinders the application of LLMs in UAVs. To address this, we conducted the first empirical study to investigate how LLMs support UAVs. To characterize common tasks and application scenarios of real-world UAV-LLM practices, we conducted a large-scale empirical study involving 997 research papers and 1,509 GitHub projects. The results classified nine common tasks (e.g., Natural Language Command Parsing) in four UAV workflows (e.g., Information Input) undertaken by LLMs in real-world UAV projects and revealed a large difference in the task distribution of research efforts and industry practices. To gain deeper insight into these differences and understand developers' perspectives on the application of LLMs in UAVs, we conducted a survey of practitioners, receiving 52 valid responses from 15 countries. The results revealed that while 40.4% of developers have attempted to apply LLMs to UAV tasks, 59.6% still face challenges integrating their UAV projects with advanced LLM capabilities. Their feedback attributes these challenges to five factors, including technological maturity, performance, safety, cost, and others, and provides practical implications for researchers and developers in conducting UAV-LLM practices.

CLFeb 8, 2025Code
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.

Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of the student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. Running the proposed ATLAS framework for 10 iterations, we construct an undergraduate-level dataset of 117k theorem statements and develop the ATLAS Translator by fine-tuning Llama3.1-8B-Instruct with LoRA. This model establishes a new state of the art, demonstrating statistically significant improvements over both the Herald Translator and the Kimina-Autoformalizer across all benchmarks (p<0.05, two-sided t-test). Furthermore, we demonstrate that the full-parameter fine-tuning of a stronger base model on the ATLAS dataset leads to superior performance. The datasets, model, and code are available at https://github.com/XiaoyangLiu-sjtu/ATLAS.

CVNov 10, 2025
Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from LDCT

Yifei Zhang, Jiashuo Zhang, Mojtaba Safari et al.

Low-dose chest computed tomography (LDCT) inherently captures both pulmonary and cardiac structures, offering a unique opportunity for joint assessment of lung and cardiovascular health. However, most existing approaches treat these domains as independent tasks, overlooking their physiological interplay and shared imaging biomarkers. We propose an Explainable Cross-Disease Reasoning Framework that enables interpretable cardiopulmonary risk assessment from a single LDCT scan. The framework introduces an agentic reasoning process that emulates clinical diagnostic thinking-first perceiving pulmonary findings, then reasoning through established medical knowledge, and finally deriving a cardiovascular judgment with explanatory rationale. It integrates three synergistic components: a pulmonary perception module that summarizes lung abnormalities, a knowledge-guided reasoning module that infers their cardiovascular implications, and a cardiac representation module that encodes structural biomarkers. Their outputs are fused to produce a holistic cardiovascular risk prediction that is both accurate and physiologically grounded. Experiments on the NLST cohort demonstrate that the proposed framework achieves state-of-the-art performance for CVD screening and mortality prediction, outperforming single-disease and purely image-based baselines. Beyond quantitative gains, the framework provides human-verifiable reasoning that aligns with cardiological understanding, revealing coherent links between pulmonary abnormalities and cardiac stress mechanisms. Overall, this work establishes a unified and explainable paradigm for cardiovascular analysis from LDCT, bridging the gap between image-based prediction and mechanism-based medical interpretation.

LGJul 10, 2025Code
Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

Yuntian Liu, Tao Zhu, Xiaoyang Liu et al.

Statement autoformalization, the automated translation of statements from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose GTED (Generalized Tree Edit Distance), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. Across the miniF2F and ProofNet benchmarks, GTED consistently ranks as a top-performing metric, achieving the highest accuracy and Kappa on miniF2F and the joint-highest accuracy on ProofNet. This strong overall performance provides the community with a computationally lightweight and more faithful metric for automated evaluation. The code and experimental results are available at https://github.com/XiaoyangLiu-sjtu/GTED.

LGSep 24, 2025
Revisiting Performance Claims for Chest X-Ray Models Using Clinical Context

Andrew Wang, Jiashuo Zhang, Michael Oberst

Public healthcare datasets of Chest X-Rays (CXRs) have long been a popular benchmark for developing computer vision models in healthcare. However, strong average-case performance of machine learning (ML) models on these datasets is insufficient to certify their clinical utility. In this paper, we use clinical context, as captured by prior discharge summaries, to provide a more holistic evaluation of current ``state-of-the-art'' models for the task of CXR diagnosis. Using discharge summaries recorded prior to each CXR, we derive a ``prior'' or ``pre-test'' probability of each CXR label, as a proxy for existing contextual knowledge available to clinicians when interpreting CXRs. Using this measure, we demonstrate two key findings: First, for several diagnostic labels, CXR models tend to perform best on cases where the pre-test probability is very low, and substantially worse on cases where the pre-test probability is higher. Second, we use pre-test probability to assess whether strong average-case performance reflects true diagnostic signal, rather than an ability to infer the pre-test probability as a shortcut. We find that performance drops sharply on a balanced test set where this shortcut does not exist, which may indicate that much of the apparent diagnostic power derives from inferring this clinical context. We argue that this style of analysis, using context derived from clinical notes, is a promising direction for more rigorous and fine-grained evaluation of clinical vision models.

CYAug 4, 2025
Web3 x AI Agents: Landscape, Integrations, and Foundational Challenges

Yiming Shen, Jiashuo Zhang, Zhenzhe Shao et al.

The convergence of Web3 technologies and AI agents represents a rapidly evolving frontier poised to reshape decentralized ecosystems. This paper presents the first and most comprehensive analysis of the intersection between Web3 and AI agents, examining five critical dimensions: landscape, economics, governance, security, and trust mechanisms. Through an analysis of 133 existing projects, we first develop a taxonomy and systematically map the current market landscape (RQ1), identifying distinct patterns in project distribution and capitalization. Building upon these findings, we further investigate four key integrations: (1) the role of AI agents in participating in and optimizing decentralized finance (RQ2); (2) their contribution to enhancing Web3 governance mechanisms (RQ3); (3) their capacity to strengthen Web3 security via intelligent vulnerability detection and automated smart contract auditing (RQ4); and (4) the establishment of robust reliability frameworks for AI agent operations leveraging Web3's inherent trust infrastructure (RQ5). By synthesizing these dimensions, we identify key integration patterns, highlight foundational challenges related to scalability, security, and ethics, and outline critical considerations for future research toward building robust, intelligent, and trustworthy decentralized systems with effective AI agent interactions.

IRJul 31, 2025
KLAN: Kuaishou Landing-page Adaptive Navigator

Fan Li, Chang Meng, Jiaqi Fu et al.

Modern online platforms configure multiple pages to accommodate diverse user needs. This multi-page architecture inherently establishes a two-stage interaction paradigm between the user and the platform: (1) Stage I: page navigation, navigating users to a specific page and (2) Stage II: in-page interaction, where users engage with customized content within the specific page. While the majority of research has been focusing on the sequential recommendation task that improves users' feedback in Stage II, there has been little investigation on how to achieve better page navigation in Stage I. To fill this gap, we formally define the task of Personalized Landing Page Modeling (PLPM) into the field of recommender systems: Given a user upon app entry, the goal of PLPM is to proactively select the most suitable landing page from a set of candidates (e.g., functional tabs, content channels, or aggregation pages) to optimize the short-term PDR metric and the long-term user engagement and satisfaction metrics, while adhering to industrial constraints. Additionally, we propose KLAN (Kuaishou Landing-page Adaptive Navigator), a hierarchical solution framework designed to provide personalized landing pages under the formulation of PLPM. KLAN comprises three key components: (1) KLAN-ISP captures inter-day static page preference; (2) KLAN-IIT captures intra-day dynamic interest transitions and (3) KLAN-AM adaptively integrates both components for optimal navigation decisions. Extensive online experiments conducted on the Kuaishou platform demonstrate the effectiveness of KLAN, obtaining +0.205% and +0.192% improvements on in Daily Active Users (DAU) and user Lifetime (LT). Our KLAN is ultimately deployed on the online platform at full traffic, serving hundreds of millions of users. To promote further research in this important area, we will release our dataset and code upon paper acceptance.

CRMar 23, 2021
TrustCross: Enabling Confidential Interoperability across Blockchains Using Trusted Hardware

Ying Lan, Jianbo Gao, Ke Wang et al.

With the rapid development of blockchain technology, different types of blockchains are adopted and interoperability across blockchains has received widespread attention. There have been many cross-chain solutions proposed in recent years, including notary scheme, sidechain, and relay chain. However, most of the existing platforms do not take confidentiality into account, although privacy has become an important concern for blockchain. In this paper, we present TrustCross, a privacy-preserving cross-chain platform to enable confidential interoperability across blockchains. The key insight behind TrustCross is to encrypt cross-chain communication data on the relay chain with the assistance of trusted execution environment and employ fine-grained access control to protect user privacy. Our experimental results show that TrustCross achieves reasonable latency and high scalability on the contract calls across heterogeneous blockchains.

SEJun 2, 2020
Kaya: A Testing Framework for Blockchain-based Decentralized Applications

Zhenhao Wu, Jiashuo Zhang, Jianbo Gao et al.

In recent years, many decentralized applications based on blockchain (DApp) have been developed. However, due to inadequate testing, DApps are easily exposed to serious vulnerabilities. We find three main challenges for DApp testing, i.e., the inherent complexity of DApp, inconvenient pre-state setting, and not-so-readable logs. In this paper, we propose a testing framework named Kaya to bridge these gaps. Kaya has three main functions. Firstly, Kaya proposes DApp behavior description language (DBDL) to make writing test cases easier. Test cases written in DBDL can also be automatically executed by Kaya. Secondly, Kaya supports a flexible and convenient way for test engineers to set the blockchain pre-states easily. Thirdly, Kaya transforms incomprehensible addresses into readable variables for easy comprehension. With these functions, Kaya can help test engineers test DApps more easily. Besides, to fit the various application environments, we provide two ways for test engineers to use Kaya, i.e., UI and command-line. Our experimental case demonstrates the potential of Kaya in helping test engineers to test DApps more easily.