Wen Fan

AI
h-index11
8papers
59citations
Novelty33%
AI Score48

8 Papers

AIJul 31, 2024
Automated Quantification of Hyperreflective Foci in SD-OCT With Diabetic Retinopathy

Idowu Paul Okuwobi, Zexuan Ji, Wen Fan et al.

The presence of hyperreflective foci (HFs) is related to retinal disease progression, and the quantity has proven to be a prognostic factor of visual and anatomical outcome in various retinal diseases. However, lack of efficient quantitative tools for evaluating the HFs has deprived ophthalmologist of assessing the volume of HFs. For this reason, we propose an automated quantification algorithm to segment and quantify HFs in spectral domain optical coherence tomography (SD-OCT). The proposed algorithm consists of two parallel processes namely: region of interest (ROI) generation and HFs estimation. To generate the ROI, we use morphological reconstruction to obtain the reconstructed image and histogram constructed for data distributions and clustering. In parallel, we estimate the HFs by extracting the extremal regions from the connected regions obtained from a component tree. Finally, both the ROI and the HFs estimation process are merged to obtain the segmented HFs. The proposed algorithm was tested on 40 3D SD-OCT volumes from 40 patients diagnosed with non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), and diabetic macular edema (DME). The average dice similarity coefficient (DSC) and correlation coefficient (r) are 69.70%, 0.99 for NPDR, 70.31%, 0.99 for PDR, and 71.30%, 0.99 for DME, respectively. The proposed algorithm can provide ophthalmologist with good HFs quantitative information, such as volume, size, and location of the HFs.

SYApr 12
i-Tac: Inverse Design of 3D-Printed Tactile Elastomers with Scalable and Tunable Optical and Mechanical Properties

Wen Fan, Dandan Zhang

Elastomers are central to vision-based tactile sensors (VBTSs), where they transduce external contact into observable deformation. Different VBTS architectures, however, require distinct optical and mechanical properties, particularly transparency and hardness. Conventional elastomer design relies on a forward, trial-and-error optimisation process from material preparation to property evaluation, which is inefficient and offers limited property scalability and target tunability. In this work, we present i-Tac, an inverse design pipeline for tailoring 3D-printed tactile elastomers with target optical and mechanical properties. Inspired by the composite structure of the human dermis, i-Tac exploits multi-material PolyJet additive manufacturing with three complementary resins. A mixture design methodology is employed to characterise the printed elastomers and establish response surface models (ReSMs) that map material compositions to functional properties, thereby defining a scalable property space. Based on user-defined targets, a desirability-function-based multi-objective optimisation is then performed to identify feasible composition regions and derive an optimal operating window for fabrication. This enables elastomers with desired properties to be manufactured in a single iteration, thereby achieving efficient target tunability. Experimental results validate the proposed i-Tac framework in terms of both property scalability and inverse design performance, showing that i-Tac can effectively tailor elastomer transparency and hardness while reducing the iterative burden of conventional forward design. By fabricating physical sensor samples from both commercial and custom designs, the proposed framework further demonstrates the potential of inverse-designed, monolithically manufactured elastomers for customisable VBTS fabrication.

CVJan 27, 2025Code
MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining

Ruiqi Wu, Na Su, Chenran Zhang et al.

Vision-language pretraining (VLP) has been investigated to generalize across diverse downstream tasks for fundus image analysis. Although recent methods showcase promising achievements, they significantly rely on large-scale private image-text data but pay less attention to the pretraining manner, which limits their further advancements. In this work, we introduce MM-Retinal V2, a high-quality image-text paired dataset comprising CFP, FFA, and OCT image modalities. Then, we propose a novel fundus vision-language pretraining model, namely KeepFIT V2, which is pretrained by integrating knowledge from the elite data spark into categorical public datasets. Specifically, a preliminary textual pretraining is adopted to equip the text encoder with primarily ophthalmic textual knowledge. Moreover, a hybrid image-text knowledge injection module is designed for knowledge transfer, which is essentially based on a combination of global semantic concepts from contrastive learning and local appearance details from generative learning. Extensive experiments across zero-shot, few-shot, and linear probing settings highlight the generalization and transferability of KeepFIT V2, delivering performance competitive to state-of-the-art fundus VLP models trained on large-scale private image-text datasets. Our dataset and model are publicly available via https://github.com/lxirich/MM-Retinal.

CVSep 16, 2025Code
WHU-STree: A Multi-modal Benchmark Dataset for Street Tree Inventory

Ruifei Ding, Zhe Chen, Wen Fan et al.

Street trees are vital to urban livability, providing ecological and social benefits. Establishing a detailed, accurate, and dynamically updated street tree inventory has become essential for optimizing these multifunctional assets within space-constrained urban environments. Given that traditional field surveys are time-consuming and labor-intensive, automated surveys utilizing Mobile Mapping Systems (MMS) offer a more efficient solution. However, existing MMS-acquired tree datasets are limited by small-scale scene, limited annotation, or single modality, restricting their utility for comprehensive analysis. To address these limitations, we introduce WHU-STree, a cross-city, richly annotated, and multi-modal urban street tree dataset. Collected across two distinct cities, WHU-STree integrates synchronized point clouds and high-resolution images, encompassing 21,007 annotated tree instances across 50 species and 2 morphological parameters. Leveraging the unique characteristics, WHU-STree concurrently supports over 10 tasks related to street tree inventory. We benchmark representative baselines for two key tasks--tree species classification and individual tree segmentation. Extensive experiments and in-depth analysis demonstrate the significant potential of multi-modal data fusion and underscore cross-domain applicability as a critical prerequisite for practical algorithm deployment. In particular, we identify key challenges and outline potential future works for fully exploiting WHU-STree, encompassing multi-modal fusion, multi-task collaboration, cross-domain generalization, spatial pattern learning, and Multi-modal Large Language Model for street tree asset management. The WHU-STree dataset is accessible at: https://github.com/WHU-USI3DV/WHU-STree.

AIAug 22, 2025Code
Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning

Ruiqi Wu, Yuang Yao, Tengfei Ma et al.

Multimodal large language models (MLLMs) have recently demonstrated remarkable reasoning abilities with reinforcement learning paradigm. Although several multimodal reasoning models have been explored in the medical domain, most of them focus exclusively on basic reasoning, which refers to shallow inference based on visual feature matching. However, real-world clinical diagnosis extends beyond basic reasoning, demanding reasoning processes that integrate heterogeneous clinical information (such as chief complaints and medical history) with multimodal medical imaging data. To bridge this gap, we introduce MM-Retinal-Reason, the first ophthalmic multimodal dataset with the full spectrum of perception and reasoning. It encompasses both basic reasoning tasks and complex reasoning tasks, aiming to enhance visual-centric fundamental reasoning capabilities and emulate realistic clinical thinking patterns. Building upon MM-Retinal-Reason, we propose OphthaReason, the first ophthalmology-specific multimodal reasoning model with step-by-step reasoning traces. To enable flexible adaptation to both basic and complex reasoning tasks, we specifically design a novel method called Uncertainty-Aware Dynamic Thinking (UADT), which estimates sample-level uncertainty via entropy and dynamically modulates the model's exploration depth using a shaped advantage mechanism. Comprehensive experiments demonstrate that our model achieves state-of-the-art performance on both basic and complex reasoning tasks, outperforming general-purpose MLLMs, medical MLLMs, RL-based medical MLLMs, and ophthalmic MLLMs by at least 24.92\%, 15.00\%, 21.20\%, and 17.66\%. Project Page: \href{https://github.com/lxirich/OphthaReason}{link}.

SYMar 15
Surgi-HDTMR: Closing the Sensorimotor Loop in Bimanual Microsurgery via Haptics, Digital Twin, and Mixed Reality

Songming Ping, Shaoyue Wen, Junhong Chen et al.

Robotic microsurgery demands precise bimanual control, intuitive interaction, and informative force feedback. However, most training platforms for robotic microsurgery lack immersive 3D interaction and high-fidelity haptics. Here, we present Surgi-HDTMR, a mixed-reality (MR) and digital-twin (DT) training system that couples bimanual haptic teleoperation with a benchtop microsurgical robotic platform, and 3D-printed phantoms. A metrically co-registered, time-synchronized DT aligns in-situ MR guidance with the physical workspace and drives a depth-adaptive haptic model that renders contact, puncture, and tissue-retraction forces. In a within-subjects study of simulated cortical navigation and tumor resection, Surgi-HDTMR shortened task time, reduced harmful contacts and collisions, and improved perceptual accuracy relative to non-haptic and non-adaptive baselines. These results suggest that tightly coupling MR overlays with a synchronized DT, together with depth-adaptive haptics, can accelerate skill acquisition and improve safety in robot-assisted microsurgery, pointing toward next-generation surgical training.

SENov 4, 2024
Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

Wen Fan, Marilyn Rego, Xin Hu et al.

Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores OpenAI's GPT-4o model's effectiveness in generating specifications on C programs that are verifiable with VeriFast, a separation logic based static verifier. Our experiment employs three different types of user inputs as well as basic and Chain-of-Thought (CoT) prompting to assess GPT's capabilities. Our results indicate that the specifications generated by GPT-4o preserve functional behavior, but struggle to be verifiable. When the specifications are verifiable they contain redundancies. Future directions are discussed to improve the performance.

ROOct 12, 2021
Fully-simulated Integration of Scamp5d Vision System and Robot Simulator

Wen Fan, Yanan Liu, Yifan Xing

This paper proposed a fully-simulated environment by integrating an on-sensor visual computing device, SCAMP, and CoppeliaSim robot simulator via interface and remote API. Within this platform, a mobile robot obstacle avoidance and target navigation with pre-set barriers is exploited with on-sensor visual computing, where images are captured in a robot simulator and processed by an on-sensor processing server after being transferred. We made our developed platform and associated algorithms for mobile robot navigation available online.