Yucheng Song

CV
h-index3
5papers
Novelty51%
AI Score48

5 Papers

18.9CVApr 12
FGML-DG: Feynman-Inspired Cognitive Science Paradigm for Cross-Domain Medical Image Segmentation

Yucheng Song, Chenxi Li, Haokang Ding et al.

In medical image segmentation across multiple modalities (e.g., MRI, CT, etc.) and heterogeneous data sources (e.g., different hospitals and devices), Domain Generalization (DG) remains a critical challenge in AI-driven healthcare. This challenge primarily arises from domain shifts, imaging variations, and patient diversity, which often lead to degraded model performance in unseen domains. To address these limitations, we identify key issues in existing methods, including insufficient simplification of complex style features, inadequate reuse of domain knowledge, and a lack of feedback-driven optimization. To tackle these problems, inspired by Feynman's learning techniques in educational psychology, this paper introduces a cognitive science-inspired meta-learning paradigm for medical image domain generalization segmentation. We propose, for the first time, a cognitive-inspired Feynman-Guided Meta-Learning framework for medical image domain generalization segmentation (FGML-DG), which mimics human cognitive learning processes to enhance model learning and knowledge transfer. Specifically, we first leverage the 'concept understanding' principle from Feynman's learning method to simplify complex features across domains into style information statistics, achieving precise style feature alignment. Second, we design a meta-style memory and recall method (MetaStyle) to emulate the human memory system's utilization of past knowledge. Finally, we incorporate a Feedback-Driven Re-Training strategy (FDRT), which mimics Feynman's emphasis on targeted relearning, enabling the model to dynamically adjust learning focus based on prediction errors. Experimental results demonstrate that our method outperforms other existing domain generalization approaches on two challenging medical image domain generalization tasks.

22.1CVApr 20
Exploring Boundary-Aware Spatial-Frequency Fusion for Camouflaged Object Detection

Song Yu, Yang Hu, Haokang Ding et al.

Camouflaged Object Detection is challenging due to the high degree of similarity between camouflaged objects and their surrounding backgrounds. Current COD methods mainly rely on edge extraction in the spatial domain and local pixel-level information, neglecting the importance of global structural features. Additionally, they fail to effectively leverage the importance of phase spectrum information within frequency domain features. To this end, we propose a COD framework BASFNet based on boundary-aware frequency domain and spatial domain fusion.This method uses dual guided integration of frequency domain and spatial domain features. A phase-spectrum-based frequency-enhanced edge exploration module (FEEM) and a spatial core segmentation module (SCSM) are introduced to jointly capture the boundary and object features of camouflaged objects. These features are then effectively integrated through a spatial-frequency fusion interaction module (SFFIM). Furthermore, the boundary detection is further optimized through an boundary-aware training strategy. BASFNet outperforms existing state-of-the-art methods on three benchmark datasets, validating the effectiveness of the fusion of frequency and spatial domain information in COD tasks.

CVNov 4, 2025
Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework

Yucheng Song, Yifan Ge, Junhao Li et al.

Medical Report Generation (MRG) is a key part of modern medical diagnostics, as it automatically generates reports from radiological images to reduce radiologists' burden. However, reliable MRG models for lesion description face three main challenges: insufficient domain knowledge understanding, poor text-visual entity embedding alignment, and spurious correlations from cross-modal biases. Previous work only addresses single challenges, while this paper tackles all three via a novel hierarchical task decomposition approach, proposing the HTSC-CIF framework. HTSC-CIF classifies the three challenges into low-, mid-, and high-level tasks: 1) Low-level: align medical entity features with spatial locations to enhance domain knowledge for visual encoders; 2) Mid-level: use Prefix Language Modeling (text) and Masked Image Modeling (images) to boost cross-modal alignment via mutual guidance; 3) High-level: a cross-modal causal intervention module (via front-door intervention) to reduce confounders and improve interpretability. Extensive experiments confirm HTSC-CIF's effectiveness, significantly outperforming state-of-the-art (SOTA) MRG methods. Code will be made public upon paper acceptance.

CVNov 24, 2025
MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery

Shuyu Cao, Minxin Chen, Yucheng Song et al.

Small object detection in Unmanned Aerial Vehicle (UAV) imagery is a persistent challenge, hindered by low resolution and background clutter. While fusing RGB and infrared (IR) data offers a promising solution, existing methods often struggle with the trade-off between effective cross-modal interaction and computational efficiency. In this letter, we introduce MambaRefine-YOLO. Its core contributions are a Dual-Gated Complementary Mamba fusion module (DGC-MFM) that adaptively balances RGB and IR modalities through illumination-aware and difference-aware gating mechanisms, and a Hierarchical Feature Aggregation Neck (HFAN) that uses a ``refine-then-fuse'' strategy to enhance multi-scale features. Our comprehensive experiments validate this dual-pronged approach. On the dual-modality DroneVehicle dataset, the full model achieves a state-of-the-art mAP of 83.2%, an improvement of 7.9% over the baseline. On the single-modality VisDrone dataset, a variant using only the HFAN also shows significant gains, demonstrating its general applicability. Our work presents a superior balance between accuracy and speed, making it highly suitable for real-world UAV applications.

CVOct 21, 2025
TreeFedDG: Alleviating Global Drift in Federated Domain Generalization for Medical Image Segmentation

Yucheng Song, Chenxi Li, Haokang Ding et al.

In medical image segmentation tasks, Domain Generalization (DG) under the Federated Learning (FL) framework is crucial for addressing challenges related to privacy protection and data heterogeneity. However, traditional federated learning methods fail to account for the imbalance in information aggregation across clients in cross-domain scenarios, leading to the Global Drift (GD) problem and a consequent decline in model generalization performance. This motivates us to delve deeper and define a new critical issue: global drift in federated domain generalization for medical imaging (FedDG-GD). In this paper, we propose a novel tree topology framework called TreeFedDG. First, starting from the distributed characteristics of medical images, we design a hierarchical parameter aggregation method based on a tree-structured topology to suppress deviations in the global model direction. Second, we introduce a parameter difference-based style mixing method (FedStyle), which enforces mixing among clients with maximum parameter differences to enhance robustness against drift. Third, we develop a a progressive personalized fusion strategy during model distribution, ensuring a balance between knowledge transfer and personalized features. Finally, during the inference phase, we use feature similarity to guide the retrieval of the most relevant model chain from the tree structure for ensemble decision-making, thereby fully leveraging the advantages of hierarchical knowledge. We conducted extensive experiments on two publicly available datasets. The results demonstrate that our method outperforms other state-of-the-art domain generalization approaches in these challenging tasks and achieves better balance in cross-domain performance.