IVSep 1, 2024Code
DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining ChannelsLinda Wei, Shengyi Hua, Shaoting Zhang et al.
Breast cancer is a highly fatal disease among cancers in women, and early detection is crucial for treatment. HER2 status, a valuable diagnostic marker based on Immunohistochemistry (IHC) staining, is instrumental in determining breast cancer status. The high cost of IHC staining and the ubiquity of Hematoxylin and Eosin (H&E) staining make the conversion from H&E to IHC staining essential. In this article, we propose a destain-restain framework for converting H&E staining to IHC staining, leveraging the characteristic that H&E staining and IHC staining of the same tissue sections share the Hematoxylin channel. We further design loss functions specifically for Hematoxylin and Diaminobenzidin (DAB) channels to generate IHC images exploiting insights from separated staining channels. Beyond the benchmark metrics on BCI contest, we have developed semantic information metrics for the HER2 level. The experimental results demonstrated that our method outperforms previous open-sourced methods in terms of image intrinsic property and semantic information.
CVMay 15, 2025Code
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningKe Wang, Junting Pan, Linda Wei et al.
Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the advancement of current LMMs in multimodal mathematical reasoning. To this end, we propose leveraging code as supervision for cross-modal alignment, since code inherently encodes all information needed to generate corresponding figures, establishing a precise connection between the two modalities. Specifically, we co-develop our image-to-code model and dataset with model-in-the-loop approach, resulting in an image-to-code model, FigCodifier and ImgCode-8.6M dataset, the largest image-code dataset to date. Furthermore, we utilize FigCodifier to synthesize novel mathematical figures and then construct MM-MathInstruct-3M, a high-quality multimodal math instruction fine-tuning dataset. Finally, we present MathCoder-VL, trained with ImgCode-8.6M for cross-modal alignment and subsequently fine-tuned on MM-MathInstruct-3M for multimodal math problem solving. Our model achieves a new open-source SOTA across all six metrics. Notably, it surpasses GPT-4o and Claude 3.5 Sonnet in the geometry problem-solving subset of MathVista, achieving improvements of 8.9% and 9.2%. The dataset and models will be released at https://github.com/mathllm/MathCoder.
CLMar 3
From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-BenchWeikang Shi, Houxing Ren, Junting Pan et al.
Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comprehensive, multi-turn teaching effectiveness. In this paper, we introduce KMP-Bench, a comprehensive K-8 Mathematical Pedagogical Benchmark designed to assess LLMs from two complementary perspectives. The first module, KMP-Dialogue, evaluates holistic pedagogical capabilities against six core principles (e.g., Challenge, Explanation, Feedback), leveraging a novel multi-turn dialogue dataset constructed by weaving together diverse pedagogical components. The second module, KMP-Skills, provides a granular assessment of foundational tutoring abilities, including multi-turn problem-solving, error detection and correction, and problem generation. Our evaluations on KMP-Bench reveal a key disparity: while leading LLMs excel at tasks with verifiable solutions, they struggle with the nuanced application of pedagogical principles. Additionally, we present KMP-Pile, a large-scale (150K) dialogue dataset. Models fine-tuned on KMP-Pile show substantial improvement on KMP-Bench, underscoring the value of pedagogically-rich training data for developing more effective AI math tutors.
IVNov 22, 2024Code
Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI SynthesisTao Song, Yicheng Wu, Minhao Hu et al.
Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature- or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.
CVMar 5
MADCrowner: Margin Aware Dental Crown Design with Template Deformation and RefinementLinda Wei, Chang Liu, Wenran Zhang et al.
Dental crown restoration is one of the most common treatment modalities for tooth defect, where personalized dental crown design is critical. While computer-aided design (CAD) systems have notably enhanced the efficiency of dental crown design, extensive manual adjustments are still required in the clinic workflow. Recent studies have explored the application of learning-based methods for the automated generation of restorative dental crowns. Nevertheless, these approaches were challenged by inadequate spatial resolution, noisy outputs, and overextension of surface reconstruction. To address these limitations, we propose \totalframework, a margin-aware mesh generation framework comprising CrownDeformR and CrownSegger. Inspired by the clinic manual workflow of dental crown design, we designed CrownDeformR to deform an initial template to the target crown based on anatomical context, which is extracted by a multi-scale intraoral scan encoder. Additionally, we introduced \marginseg, a novel margin segmentation network, to extract the cervical margin of the target tooth. The performance of CrownDeformR improved with the cervical margin as an extra constraint. And it was also utilized as the boundary condition for the tailored postprocessing method, which removed the overextended area of the reconstructed surface. We constructed a large-scale intraoral scan dataset and performed extensive experiments. The proposed method significantly outperformed existing approaches in both geometric accuracy and clinical feasibility.
CVJul 24, 2025
Dissecting the Dental Lung Cancer Axis via Mendelian Randomization and Mediation AnalysisWenran Zhang, Huihuan Luo, Linda Wei et al.
Periodontitis and dental caries are common oral diseases affecting billions globally. While observational studies suggest links between these conditions and lung cancer, causality remains uncertain. This study used two sample Mendelian randomization (MR) to explore causal relationships between dental traits (periodontitis, dental caries) and lung cancer subtypes, and to assess mediation by pulmonary function. Genetic instruments were derived from the largest available genome wide association studies, including data from 487,823 dental caries and 506,594 periodontitis cases, as well as lung cancer data from the Transdisciplinary Research of Cancer in Lung consortium. Inverse variance weighting was the main analytical method; lung function mediation was assessed using the delta method. The results showed a significant positive causal effect of dental caries on overall lung cancer and its subtypes. Specifically, a one standard deviation increase in dental caries incidence was associated with a 188.0% higher risk of squamous cell lung carcinoma (OR = 2.880, 95% CI = 1.236--6.713, p = 0.014), partially mediated by declines in forced vital capacity (FVC) and forced expiratory volume in one second (FEV1), accounting for 5.124% and 5.890% of the total effect. No causal effect was found for periodontitis. These findings highlight a causal role of dental caries in lung cancer risk and support integrating dental care and pulmonary function monitoring into cancer prevention strategies.
CVJul 23, 2025
VBCD: A Voxel-Based Framework for Personalized Dental Crown DesignLinda Wei, Chang Liu, Wenran Zhang et al.
The design of restorative dental crowns from intraoral scans is labor-intensive for dental technicians. To address this challenge, we propose a novel voxel-based framework for automated dental crown design (VBCD). The VBCD framework generates an initial coarse dental crown from voxelized intraoral scans, followed by a fine-grained refiner incorporating distance-aware supervision to improve accuracy and quality. During the training stage, we employ the Curvature and Margin line Penalty Loss (CMPL) to enhance the alignment of the generated crown with the margin line. Additionally, a positional prompt based on the FDI tooth numbering system is introduced to further improve the accuracy of the generated dental crowns. Evaluation on a large-scale dataset of intraoral scans demonstrated that our approach outperforms existing methods, providing a robust solution for personalized dental crown design.
CVMar 4, 2024
Modality-Aware and Shift Mixer for Multi-modal Brain Tumor SegmentationZhongzhen Huang, Linda Wei, Shaoting Zhang et al.
Combining images from multi-modalities is beneficial to explore various information in computer vision, especially in the medical domain. As an essential part of clinical diagnosis, multi-modal brain tumor segmentation aims to delineate the malignant entity involving multiple modalities. Although existing methods have shown remarkable performance in the task, the information exchange for cross-scale and high-level representations fusion in spatial and modality are limited in these methods. In this paper, we present a novel Modality Aware and Shift Mixer that integrates intra-modality and inter-modality dependencies of multi-modal images for effective and robust brain tumor segmentation. Specifically, we introduce a Modality-Aware module according to neuroimaging studies for modeling the specific modality pair relationships at low levels, and a Modality-Shift module with specific mosaic patterns is developed to explore the complex relationships across modalities at high levels via the self-attention. Experimentally, we outperform previous state-of-the-art approaches on the public Brain Tumor Segmentation (BraTS 2021 segmentation) dataset. Further qualitative experiments demonstrate the efficacy and robustness of MASM.
CVMay 30, 2023
Voxel2Hemodynamics: An End-to-end Deep Learning Method for Predicting Coronary Artery HemodynamicsZiyu Ni, Linda Wei, Lijian Xu et al.
Local hemodynamic forces play an important role in determining the functional significance of coronary arterial stenosis and understanding the mechanism of coronary disease progression. Computational fluid dynamics (CFD) have been widely performed to simulate hemodynamics non-invasively from coronary computed tomography angiography (CCTA) images. However, accurate computational analysis is still limited by the complex construction of patient-specific modeling and time-consuming computation. In this work, we proposed an end-to-end deep learning framework, which could predict the coronary artery hemodynamics from CCTA images. The model was trained on the hemodynamic data obtained from 3D simulations of synthetic and real datasets. Extensive experiments demonstrated that the predicted hemdynamic distributions by our method agreed well with the CFD-derived results. Quantitatively, the proposed method has the capability of predicting the fractional flow reserve with an average error of 0.5\% and 2.5\% for the synthetic dataset and real dataset, respectively. Particularly, our method achieved much better accuracy for the real dataset compared to PointNet++ with the point cloud input. This study demonstrates the feasibility and great potential of our end-to-end deep learning method as a fast and accurate approach for hemodynamic analysis.