Yuan Mei

CV
h-index18
6papers
228citations
Novelty58%
AI Score59

6 Papers

CLFeb 2Code
Kimi K2.5: Visual Agentic Intelligence

Kimi Team, Tongtong Bai, Yifan Bai et al.

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.

CVAug 7, 2025Code
Robust Image Stitching with Optimal Plane

Lang Nie, Yuan Mei, Kang Liao et al.

We present \textit{RopStitch}, an unsupervised deep image stitching framework with both robustness and naturalness. To ensure the robustness of \textit{RopStitch}, we propose to incorporate the universal prior of content perception into the image stitching model by a dual-branch architecture. It separately captures coarse and fine features and integrates them to achieve highly generalizable performance across diverse unseen real-world scenes. Concretely, the dual-branch model consists of a pretrained branch to capture semantically invariant representations and a learnable branch to extract fine-grained discriminative features, which are then merged into a whole by a controllable factor at the correlation level. Besides, considering that content alignment and structural preservation are often contradictory to each other, we propose a concept of virtual optimal planes to relieve this conflict. To this end, we model this problem as a process of estimating homography decomposition coefficients, and design an iterative coefficient predictor and minimal semantic distortion constraint to identify the optimal plane. This scheme is finally incorporated into \textit{RopStitch} by warping both views onto the optimal plane bidirectionally. Extensive experiments across various datasets demonstrate that \textit{RopStitch} significantly outperforms existing methods, particularly in scene robustness and content naturalness. The code is available at {\color{red}https://github.com/MmelodYy/RopStitch}.

LGMay 9
Deterministic Decomposition of Stochastic Generative Dynamics

Xingyu Song, Yuan Mei, Naoya Takeishi

Modern generative models can be understood as probability transport from a simple base distribution to a target data distribution. Deterministic transport models offer tractable velocity-field parameterizations, whereas stochastic generative models capture richer density evolution through drift and diffusion. Yet when stochastic dynamics are described through deterministic velocity fields, the effects of drift and diffusion are often compressed into a single effective field, obscuring the distinct roles of deterministic evolution and stochastic fluctuation. In this work, we show that the deterministic field \(b_t\) of a stochastic generative process admits a natural transport--osmotic decomposition that separates deterministic transport from stochastic, diffusion-induced effects: \(b_t = u_t + d_t\), where \(u_t\) governs marginal probability transport and \(d_t\) captures an osmotic effect induced by diffusion and determined by the marginal score. Based on this decomposition, we propose Bridge Matching, a flow-based framework for learning decomposed generative dynamics through both marginal and conditional formulations. In generative modeling experiments, we recombine the learned components as \(b_t = u_t + λ_d d_t\), showing that the proposed decomposition enables interpretable and controllable sampling by adjusting the osmotic contribution in probability transport.

AIMay 9
M$^3$: Reframing Training Measures for Discretized Physical Simulations

Yuan Mei, Xingyu Song, Xiaowen Song et al.

Neural surrogate models for physical simulations are trained on discretized samples of continuous domains, where the induced empirical measure leads to uneven supervision, biasing optimization and causing spatial inconsistencies in physical fidelity. To mitigate this measure-induced bias, we propose M$^3$ (Multi-scale Morton Measure), a scalable framework that balances training measures by partitioning space according to physical variation and allocating supervision across multiple scales. Applied to three industrial-scale datasets with diverse discretizations, M$^3$ consistently improves predictions in the continuous physical domain, achieving up to 4.7$\times$ lower error in large-scale volumetric cases. These gains persist under aggressive subsampling (160M $\rightarrow$ 16M $\rightarrow$ 1.6M points), where M$^3$-trained models outperform those trained on higher-resolution data, reducing physics-weighted relative $L_2$ error by 3--4$\times$ and the corresponding MSE by up to 13$\times$. These results highlight data distribution as a key factor in operator learning and position M$^3$ as a scalable, data-efficient approach for physically consistent modeling.

CVMar 11
UniStitch: Unifying Semantic and Geometric Features for Image Stitching

Yuan Mei, Lang Nie, Kang Liao et al.

Traditional image stitching methods estimate warps from hand-crafted geometric features, whereas recent learning-based solutions leverage semantic features from neural networks instead. These two lines of research have largely diverged along separate evolution, with virtually no meaningful convergence to date. In this paper, we take a pioneering step to bridge this gap by unifying semantic and geometric features with UniStitch, a unified image stitching framework from multimodal features. To align discrete geometric features (i.e., keypoint) with continuous semantic feature maps, we present a Neural Point Transformer (NPT) module, which transforms unordered, sparse 1D geometric keypoints into ordered, dense 2D semantic maps. Then, to integrate the advantages of both representations, an Adaptive Mixture of Experts (AMoE) module is designed to fuse geometric and semantic representations. It dynamically shifts focus toward more reliable features during the fusion process, allowing the model to handle complex scenes, especially when either modality might be compromised. The fused representation can be adopted into common deep stitching pipelines, delivering significant performance gains over any single feature. Experiments show that UniStitch outperforms existing state-of-the-art methods with a large margin, paving the way for a unified paradigm between traditional and learning-based image stitching.

LGDec 31, 2020
PFL-MoE: Personalized Federated Learning Based on Mixture of Experts

Binbin Guo, Yuan Mei, Danyang Xiao et al.

Federated learning (FL) is an emerging distributed machine learning paradigm that avoids data sharing among training nodes so as to protect data privacy. Under coordination of the FL server, each client conducts model training using its own computing resource and private data set. The global model can be created by aggregating the training results of clients. To cope with highly non-IID data distributions, personalized federated learning (PFL) has been proposed to improve overall performance by allowing each client to learn a personalized model. However, one major drawback of a personalized model is the loss of generalization. To achieve model personalization while maintaining generalization, in this paper, we propose a new approach, named PFL-MoE, which mixes outputs of the personalized model and global model via the MoE architecture. PFL-MoE is a generic approach and can be instantiated by integrating existing PFL algorithms. Particularly, we propose the PFL-MF algorithm which is an instance of PFL-MoE based on the freeze-base PFL algorithm. We further improve PFL-MF by enhancing the decision-making ability of MoE gating network and propose a variant algorithm PFL-MFE. We demonstrate the effectiveness of PFL-MoE by training the LeNet-5 and VGG-16 models on the Fashion-MNIST and CIFAR-10 datasets with non-IID partitions.