Mingyuan Ge

h-index13
2papers

2 Papers

CVJul 12, 2025Code
Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching

Junyu Chen, Yihua Gao, Mingyuan Ge et al.

Image-text matching is crucial for bridging the semantic gap between computer vision and natural language processing. However, existing methods still face challenges in handling high-order associations and semantic ambiguities among similar instances. These ambiguities arise from subtle differences between soft positive samples (semantically similar but incorrectly labeled) and soft negative samples (locally matched but globally inconsistent), creating matching uncertainties. Furthermore, current methods fail to fully utilize the neighborhood relationships among semantically similar instances within training batches, limiting the model's ability to learn high-order shared knowledge. This paper proposes the Ambiguity-Aware and High-order Relation learning framework (AAHR) to address these issues. AAHR constructs a unified representation space through dynamic clustering prototype contrastive learning, effectively mitigating the soft positive sample problem. The framework introduces global and local feature extraction mechanisms and an adaptive aggregation network, significantly enhancing full-grained semantic understanding capabilities. Additionally, AAHR employs intra-modal and inter-modal correlation matrices to investigate neighborhood relationships among sample instances thoroughly. It incorporates GNN to enhance semantic interactions between instances. Furthermore, AAHR integrates momentum contrastive learning to expand the negative sample set. These combined strategies significantly improve the model's ability to discriminate between features. Experimental results demonstrate that AAHR outperforms existing state-of-the-art methods on Flickr30K, MSCOCO, and ECCV Caption datasets, considerably improving the accuracy and efficiency of image-text matching. The code and model checkpoints for this research are available at https://github.com/Image-Text-Matching/AAHR .

MTRL-SCIMar 25, 2025
Limited-angle x-ray nano-tomography with machine-learning enabled iterative reconstruction engine

Chonghang Zhao, Mingyuan Ge, Xiaogang Yang et al.

A long-standing challenge in tomography is the 'missing wedge' problem, which arises when the acquisition of projection images within a certain angular range is restricted due to geometrical constraints. This incomplete dataset results in significant artifacts and poor resolution in the reconstructed image. To tackle this challenge, we propose an approach dubbed Perception Fused Iterative Tomography Reconstruction Engine, which integrates a convolutional neural network (CNN) with perceptional knowledge as a smart regularizer into an iterative solving engine. We employ the Alternating Direction Method of Multipliers to optimize the solution in both physics and image domains, thereby achieving a physically coherent and visually enhanced result. We demonstrate the effectiveness of the proposed approach using various experimental datasets obtained with different x-ray microscopy techniques. All show significantly improved reconstruction even with a missing wedge of over 100 degrees - a scenario where conventional methods fail. Notably, it also improves the reconstruction in case of sparse projections, despite the network not being specifically trained for that. This demonstrates the robustness and generality of our method of addressing commonly occurring challenges in 3D x-ray imaging applications for real-world problems.