AINov 14, 2025Code
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language ModelsWenhao Zhou, Hao Zheng, Rong Zhao
Large Vision-Language Models (LVLMs) typically align visual features from an encoder with a pre-trained Large Language Model (LLM). However, this makes the visual perception module a bottleneck, which constrains the overall capabilities of LVLMs. Conventional evaluation benchmarks, while rich in visual semantics, often contain unavoidable local shortcuts that can lead to an overestimation of models' perceptual abilities. Here, we introduce TopoPerception, a benchmark that leverages topological properties to rigorously evaluate the global visual perception capabilities of LVLMs across various granularities. Since topology depends on the global structure of an image and is invariant to local features, TopoPerception enables a shortcut-free assessment of global perception, fundamentally distinguishing it from semantically rich tasks. We evaluate state-of-the-art models on TopoPerception and find that even at the coarsest perceptual granularity, all models perform no better than random chance, indicating a profound inability to perceive global visual features. Notably, a consistent trend emerge within model families: more powerful models with stronger reasoning capabilities exhibit lower accuracy. This suggests that merely scaling up models is insufficient to address this deficit and may even exacerbate it. Progress may require new training paradigms or architectures. TopoPerception not only exposes a critical bottleneck in current LVLMs but also offers a lens and direction for improving their global visual perception. The data and code are publicly available at: https://github.com/Wenhao-Zhou/TopoPerception.
CLJul 1, 2025Code
Causal Prompting for Implicit Sentiment Analysis with Large Language ModelsJing Ren, Wenhao Zhou, Bowen Li et al.
Implicit Sentiment Analysis (ISA) aims to infer sentiment that is implied rather than explicitly stated, requiring models to perform deeper reasoning over subtle contextual cues. While recent prompting-based methods using Large Language Models (LLMs) have shown promise in ISA, they often rely on majority voting over chain-of-thought (CoT) reasoning paths without evaluating their causal validity, making them susceptible to internal biases and spurious correlations. To address this challenge, we propose CAPITAL, a causal prompting framework that incorporates front-door adjustment into CoT reasoning. CAPITAL decomposes the overall causal effect into two components: the influence of the input prompt on the reasoning chains, and the impact of those chains on the final output. These components are estimated using encoder-based clustering and the NWGM approximation, with a contrastive learning objective used to better align the encoder's representation with the LLM's reasoning space. Experiments on benchmark ISA datasets with three LLMs demonstrate that CAPITAL consistently outperforms strong prompting baselines in both accuracy and robustness, particularly under adversarial conditions. This work offers a principled approach to integrating causal inference into LLM prompting and highlights its benefits for bias-aware sentiment reasoning. The source code and case study are available at: https://github.com/whZ62/CAPITAL.
CLNov 18, 2025
A Specialized Large Language Model for Clinical Reasoning and Diagnosis in Rare DiseasesTao Yang, Dandan Huang, Yunting Lin et al.
Rare diseases affect hundreds of millions worldwide, yet diagnosis often spans years. Convectional pipelines decouple noisy evidence extraction from downstream inferential diagnosis, and general/medical large language models (LLMs) face scarce real world electronic health records (EHRs), stale domain knowledge, and hallucinations. We assemble a large, domain specialized clinical corpus and a clinician validated reasoning set, and develop RareSeek R1 via staged instruction tuning, chain of thought learning, and graph grounded retrieval. Across multicenter EHR narratives and public benchmarks, RareSeek R1 attains state of the art accuracy, robust generalization, and stability under noisy or overlapping phenotypes. Augmented retrieval yields the largest gains when narratives pair with prioritized variants by resolving ambiguity and aligning candidates to mechanisms. Human studies show performance on par with experienced physicians and consistent gains in assistive use. Notably, transparent reasoning highlights decisive non phenotypic evidence (median 23.1%, such as imaging, interventions, functional tests) underpinning many correct diagnoses. This work advances a narrative first, knowledge integrated reasoning paradigm that shortens the diagnostic odyssey and enables auditable, clinically translatable decision support.
CYOct 27, 2025
MFiSP: A Multimodal Fire Spread Prediction FrameworkAlec Sathiyamoorthy, Wenhao Zhou, Xiangmin Zhou et al.
The 2019-2020 Black Summer bushfires in Australia devastated 19 million hectares, destroyed 3,000 homes, and lasted seven months, demonstrating the escalating scale and urgency of wildfire threats requiring better forecasting for effective response. Traditional fire modeling relies on manual interpretation by Fire Behaviour Analysts (FBAns) and static environmental data, often leading to inaccuracies and operational limitations. Emerging data sources, such as NASA's FIRMS satellite imagery and Volunteered Geographic Information, offer potential improvements by enabling dynamic fire spread prediction. This study proposes a Multimodal Fire Spread Prediction Framework (MFiSP) that integrates social media data and remote sensing observations to enhance forecast accuracy. By adapting fuel map manipulation strategies between assimilation cycles, the framework dynamically adjusts fire behavior predictions to align with the observed rate of spread. We evaluate the efficacy of MFiSP using synthetically generated fire event polygons across multiple scenarios, analyzing individual and combined impacts on forecast perimeters. Results suggest that our MFiSP integrating multimodal data can improve fire spread prediction beyond conventional methods reliant on FBAn expertise and static inputs.
CRSep 28, 2020
STR: Secure Computation on Additive Shares Using the Share-Transform-Reveal StrategyZhihua Xia, Qi Gu, Wenhao Zhou et al.
The rapid development of cloud computing has probably benefited each of us. However, the privacy risks brought by untrustworthy cloud servers arise the attention of more and more people and legislatures. In the last two decades, plenty of works seek to outsource various specific tasks while ensuring the security of private data. The tasks to be outsourced are countless; however, the computations involved are similar. In this paper, we construct a series of novel protocols that support the secure computation of various functions on numbers (e.g., the basic elementary functions) and matrices (e.g., the calculation of eigenvectors and eigenvalues) in arbitrary $n\geq 2$ servers. All protocols only require constant rounds of interactions and achieve the low computation complexity. Moreover, the proposed $n$-party protocols ensure the security of private data even though $n-1$ servers collude. The convolutional neural network models are utilized as the case studies to verify the protocols. The theoretical analysis and experimental results demonstrate the correctness, efficiency, and security of the proposed protocols.
CRSep 15, 2020
Privacy-Preserving Image Retrieval Based on Additive Secret SharingZhihua Xia, Qi Gu, Lizhi Xiong et al.
The rapid growth of digital images motivates individuals and organizations to upload their images to the cloud server. To preserve privacy, image owners would prefer to encrypt the images before uploading, but it would strongly limit the efficient usage of images. Plenty of existing schemes on privacy-preserving Content-Based Image Retrieval (PPCBIR) try to seek the balance between security and retrieval ability. However, compared to the advanced technologies in CBIR like Convolutional Neural Network (CNN), the existing PPCBIR schemes are far deficient in both accuracy and efficiency. With more cloud service providers, the collaborative secure image retrieval service provided by multiple cloud servers becomes possible. In this paper, inspired by additive secret sharing technology, we propose a series of additive secure computing protocols on numbers and matrices with better efficiency, and then show their application in PPCBIR. Specifically, we extract CNN features, decrease the dimension of features and build the index securely with the help of our protocols, which include the full process of image retrieval in the plaintext domain. The experiments and security analysis demonstrate the efficiency, accuracy, and security of our scheme.
CRSep 11, 2020
Efficient Privacy-Preserving Computation Based on Additive Secret SharingLizhi Xiong, Wenhao Zhou, Zhihua Xia et al.
The emergence of cloud computing provides a new computing paradigm for users -- massive and complex computing tasks can be outsourced to cloud servers. However, the privacy issues also follow. Fully homomorphic encryption shows great potential in privacy-preserving computation, yet it is not ready for practice. At present, secure multiparty computation (MPC) remains mainly approach to deal with sensitive data. In this paper, following the secret sharing based MPC paradigm, we propose a secure 2-party computation scheme, in which cloud servers can securely evaluate functions with high efficiency. We first propose the multiplicative secret sharing (MSS) based on typical additive secret sharing (ASS). Then, we design protocols to switch shared secret between MSS and ASS, based on which a series of protocols for comparison and nearly all of the elementary functions are proposed. We prove that all the proposed protocols are Universally Composable secure in the honest-but-curious model. Finally, we will show the remarkable progress of our protocols on both communication efficiency and functionality completeness.