Weihong Zhang

AI
h-index12
5papers
149citations
Novelty46%
AI Score41

5 Papers

CVAug 15, 2025Code
Ovis2.5 Technical Report

Shiyin Lu, Yang Li, Yu Xia et al.

We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the degradation from fixed-resolution tiling and preserving both fine detail and global layout -- crucial for visually dense content like complex charts. To strengthen reasoning, we train the model to move beyond linear chain-of-thought and perform reflection -- including self-checking and revision. This advanced capability is exposed as an optional "thinking mode" at inference time, allowing users to trade latency for enhanced accuracy on difficult inputs. The model is trained via a comprehensive five-phase curriculum that progressively builds its skills. The process begins with foundational visual and multimodal pretraining, advances through large-scale instruction tuning, and culminates in alignment and reasoning enhancement using DPO and GRPO. To scale these upgrades efficiently, we employ multimodal data packing and hybrid parallelism, yielding a significant end-to-end speedup. We release two open-source models: Ovis2.5-9B and Ovis2.5-2B. The latter continues the "small model, big performance" philosophy of Ovis2, making it ideal for resource-constrained, on-device scenarios. On the OpenCompass multimodal leaderboard, Ovis2.5-9B averages 78.3, marking a substantial improvement over its predecessor, Ovis2-8B, and achieving state-of-the-art results among open-source MLLMs in the sub-40B parameter range; Ovis2.5-2B scores 73.9, establishing SOTA for its size. Beyond aggregate scores, Ovis2.5 achieves leading results on STEM benchmarks, exhibits strong capabilities on grounding and video tasks, and achieves open-source SOTA at its scale for complex chart analysis.

NAOct 11, 2025
Learning Operators through Coefficient Mappings in Fixed Basis Spaces

Chuqi Chen, Yang Xiang, Weihong Zhang

Operator learning has emerged as a powerful paradigm for approximating solution operators of partial differential equations (PDEs) and other functional mappings. \textcolor{red}{}{Classical approaches} typically adopt a pointwise-to-pointwise framework, where input functions are sampled at prescribed locations and mapped directly to solution values. We propose the Fixed-Basis Coefficient to Coefficient Operator Network (FB-C2CNet), which learns operators in the coefficient space induced by prescribed basis functions. In this framework, the input function is projected onto a fixed set of basis functions (e.g., random features or finite element bases), and the neural operator predicts the coefficients of the solution function in the same or another basis. By decoupling basis selection from network training, FB-C2CNet reduces training complexity, enables systematic analysis of how basis choice affects approximation accuracy, and clarifies what properties of coefficient spaces (such as effective rank and coefficient variations) are critical for generalization. Numerical experiments on Darcy flow, Poisson equations in regular, complex, and high-dimensional domains, and elasticity problems demonstrate that FB-C2CNet achieves high accuracy and computational efficiency, showing its strong potential for practical operator learning tasks.

IVFeb 3, 2020
SuperDTI: Ultrafast diffusion tensor imaging and fiber tractography with deep learning

Hongyu Li, Zifei Liang, Chaoyi Zhang et al.

Purpose: To propose a deep learning-based reconstruction framework for ultrafast and robust diffusion tensor imaging and fiber tractography. Methods: We propose SuperDTI to learn the nonlinear relationship between diffusion-weighted images (DWIs) and the corresponding tensor-derived quantitative maps as well as the fiber tractography. Super DTI bypasses the tensor fitting procedure, which is well known to be highly susceptible to noise and motion in DWIs. The network is trained and tested using datasets from Human Connectome Project and patients with ischemic stroke. SuperDTI is compared against the state-of-the-art methods for diffusion map reconstruction and fiber tracking. Results: Using training and testing data both from the same protocol and scanner, SuperDTI is shown to generate fractional anisotropy and mean diffusivity maps, as well as fiber tractography, from as few as six raw DWIs. The method achieves a quantification error of less than 5% in all regions of interest in white matter and gray matter structures. We also demonstrate that the trained neural network is robust to noise and motion in the testing data, and the network trained using healthy volunteer data can be directly applied to stroke patient data without compromising the lesion detectability. Conclusion: This paper demonstrates the feasibility of superfast diffusion tensor imaging and fiber tractography using deep learning with as few as six DWIs directly, bypassing tensor fitting. Such a significant reduction in scan time may allow the inclusion of DTI into the clinical routine for many potential applications.

AIFeb 6, 2013
Fast Value Iteration for Goal-Directed Markov Decision Processes

Nevin Lianwen Zhang, Weihong Zhang

Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to accelerate value iteration, a standard algorithm for solving Markov decision processes. Empirical studies have shown that the techniques can bring about significant speedups.

AIJan 23, 2013
A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

Nevin Lianwen Zhang, Stephen S. Lee, Weihong Zhang

We present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov decision processes (MDPs). The technique can be easily incorporated into any existing POMDP value iteration algorithms. Experiments have been conducted on several test problems with one POMDP value iteration algorithm called incremental pruning. We find that the technique can make incremental pruning run several orders of magnitude faster.