Jiahang Chen

h-index5
2papers

2 Papers

IVJul 29, 2023
A 3D deep learning classifier and its explainability when assessing coronary artery disease

Wing Keung Cheung, Jeremy Kalindjian, Robert Bell et al.

Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. The current clinical practice is to perform CAD diagnosis through analysing medical images from computed tomography coronary angiography (CTCA). Most current approaches utilise deep learning methods but require centerline extraction and multi-planar reconstruction. These indirect methods are not designed in a clinician-friendly manner, and they complicate the interventional procedure. Furthermore, the current deep learning methods do not provide exact explainability and limit the usefulness of these methods to be deployed in clinical settings. In this study, we first propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on CTCA images, then we demonstrate a 2D modified U-Net model can be subsequently employed to segment the coronary arteries. Our proposed approach outperforms the state-of-the-art models by 21.43% in terms of classification accuracy. The classification model with focal loss provides a better and more focused heat map, and the segmentation model provides better explainability than the classification-only model. The proposed holistic approach not only provides a simpler and clinician-friendly solution but also good classification accuracy and exact explainability for CAD diagnosis.

CLApr 22, 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.

Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2.5 Pro, achieves only 36.9% accuracy compared to human experts' 61.9%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https://www.phybench.cn/.