62.2CVMay 26
FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language ModelingGuixian Xu, Yide Liang, Zeli Su et al.
Vision-language models have progressed rapidly, but Tibetan remains a severely underserved low-resource language due to the lack of reproducible training and evaluation infrastructure. To fill this gap, we introduce FTibSuite, a comprehensive resource suite for Tibetan vision-language research, consisting of FTibData (human-verified multimodal training corpora spanning continual pretraining, image-text alignment, and instruction tuning data), FTibBench (Tibetan adaptations of five mainstream multimodal benchmarks with a hierarchical quality-control workflow to reduce translation noise), and FTibVLM, a reproducible baseline built on Qwen3-VL-8B-Instruct via a three-stage adaptation pipeline. Experiments on FTibBench show FTibVLM delivers consistent performance gains across all tasks, such as improving MMBench accuracy from 42.97 to 67.78 and POPE-random accuracy from 47.53 to 80.56, while retaining the backbone's original Chinese capabilities with minimal degradation, providing the first standardized foundation for Tibetan multimodal research.
IVAug 31, 2022
Practical Operator Sketching Framework for Accelerating Iterative Data-Driven Solutions in Inverse ProblemsJunqi Tang, Guixian Xu, Subhadip Mukherjee et al.
We propose a new operator-sketching paradigm for designing efficient iterative data-driven reconstruction (IDR) schemes, e.g. Plug-and-Play algorithms and deep unrolling networks. These IDR schemes are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially X-ray CT and MRI imaging, these IDR schemes typically become inefficient both in terms of computation, due to the need of computing multiple times the high-dimensional forward and adjoint operators. In this work, we explore and propose a universal dimensionality reduction framework for accelerating IDR schemes in solving imaging inverse problems, based on leveraging the sketching techniques from stochastic optimization. Using this framework, we derive a number of accelerated IDR schemes, such as the plug-and-play multi-stage sketched gradient (PnP-MS2G) and sketching-based primal-dual (LSPD and Sk-LSPD) deep unrolling networks. Meanwhile, for fully accelerating PnP schemes when the denoisers are computationally expensive, we provide novel stochastic lazy denoising schemes (Lazy-PnP and Lazy-PnP-EQ), leveraging the ProxSkip scheme in optimization and equivariant image denoisers, which can massively accelerate the PnP algorithms with improved practicality. We provide theoretical analysis for recovery guarantees of instances of the proposed framework. Our numerical experiments on natural image processing and tomographic image reconstruction demonstrate the remarkable effectiveness of our sketched IDR schemes.
CVApr 16, 2023
Enhancing Electrical Impedance Tomography reconstruction using Learned Half-Quadratic Splitting Networks with Anderson AccelerationGuixian Xu, Huihui Wang, Qingping Zhou
Electrical Impedance Tomography (EIT) is widely applied in medical diagnosis, industrial inspection, and environmental monitoring. Combining the physical principles of the imaging system with the advantages of data-driven deep learning networks, physics-embedded deep unrolling networks have recently emerged as a promising solution in computational imaging. However, the inherent nonlinear and ill-posed properties of EIT image reconstruction still present challenges to existing methods in terms of accuracy and stability. To tackle this challenge, we propose the learned half-quadratic splitting (HQSNet) algorithm for incorporating physics into learning-based EIT imaging. We then apply Anderson acceleration (AA) to the HQSNet algorithm, denoted as AA-HQSNet, which can be interpreted as AA applied to the Gauss-Newton step and the learned proximal gradient descent step of the HQSNet, respectively. AA is a widely-used technique for accelerating the convergence of fixed-point iterative algorithms and has gained significant interest in numerical optimization and machine learning. However, the technique has received little attention in the inverse problems community thus far. Employing AA enhances the convergence rate compared to the standard HQSNet while simultaneously avoiding artifacts in the reconstructions. Lastly, we conduct rigorous numerical and visual experiments to show that the AA module strengthens the HQSNet, leading to robust, accurate, and considerably superior reconstructions compared to state-of-the-art methods. Our Anderson acceleration scheme to enhance HQSNet is generic and can be applied to improve the performance of various physics-embedded deep learning methods.
30.1CLMay 14
Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment TaxZeli Su, Ziyin Zhang, Zhou Liu et al.
Extending large language models (LLMs) to low-resource languages often incurs an "alignment tax": improvements in the target language come at the cost of catastrophic forgetting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions. To address this limitation, we propose a semantic-space alignment paradigm powered by Group Relative Policy Optimization (GRPO), where the model is optimized using embedding-level semantic rewards rather than likelihood maximization. This objective encourages meaning preservation through flexible realizations, enabling controlled updates that reduce destructive interference with pretrained knowledge. We evaluate our approach on Tibetan-Chinese machine translation and Tibetan headline generation. Experiments show that our method acquires low-resource capabilities while markedly mitigating alignment tax, preserving general competence more effectively than SFT. Despite producing less rigid surface overlap, semantic RL yields higher semantic quality and preference in open-ended generation, and few-shot transfer results indicate that it learns more transferable and robust representations under limited supervision. Overall, our study demonstrates that reinforcement learning with semantic rewards provides a safer and more reliable pathway for inclusive low-resource language expansion.
IVNov 8, 2024
Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse ProblemsGuixian Xu, Jinglai Li, Junqi Tang
Equivariant Imaging (EI) regularization has become the de-facto technique for unsupervised training of deep imaging networks, without any need of ground-truth data. Observing that the EI-based unsupervised training paradigm currently has significant computational redundancy leading to inefficiency in high-dimensional applications, we propose a sketched EI regularization which leverages the randomized sketching techniques for acceleration. We apply our sketched EI regularization to develop an accelerated deep internal learning framework, which can be efficiently applied for test-time network adaptation. Additionally, for network adaptation tasks, we propose a parameter-efficient approach to accelerate both EI and Sketched-EI via optimizing only the normalization layers. Our numerical study on X-ray CT and multicoil magnetic resonance image reconstruction tasks demonstrate that our approach can achieve significant computational acceleration over standard EI counterpart in single-input setting and network adaptation at test time.
IVJul 9, 2025
Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP DenoisersGuixian Xu, Jinglai Li, Junqi Tang
In this work, we propose Fast Equivariant Imaging (FEI), a novel unsupervised learning framework to rapidly and efficiently train deep imaging networks without ground-truth data. From the perspective of reformulating the Equivariant Imaging based optimization problem via the method of Lagrange multipliers and utilizing plug-and-play denoisers, this novel unsupervised scheme shows superior efficiency and performance compared to the vanilla Equivariant Imaging paradigm. In particular, our FEI schemes achieve an order-of-magnitude (10x) acceleration over standard EI on training U-Net for X-ray CT reconstruction and image inpainting, with improved generalization performance.
CLFeb 15, 2025
Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource LanguagesZeli Su, Ziyin Zhang, Guixian Xu et al.
While multilingual language models like XLM-R have advanced multilingualism in NLP, they still perform poorly in extremely low-resource languages. This situation is exacerbated by the fact that modern LLMs such as LLaMA and Qwen support far fewer languages than XLM-R, making text generation models non-existent for many languages in the world. To tackle this challenge, we propose a novel framework for adapting multilingual encoders to text generation in extremely low-resource languages. By reusing the weights between the encoder and the decoder, our framework allows the model to leverage the learned semantic space of the encoder, enabling efficient learning and effective generalization in low-resource languages. Applying this framework to four Chinese minority languages, we present XLM-SWCM, and demonstrate its superior performance on various downstream tasks even when compared with much larger models.
LGJan 14
A New Convergence Analysis of Plug-and-Play Proximal Gradient Descent Under Prior MismatchGuixian Xu, Jinglai Li, Junqi Tang
In this work, we provide a new convergence theory for plug-and-play proximal gradient descent (PnP-PGD) under prior mismatch where the denoiser is trained on a different data distribution to the inference task at hand. To the best of our knowledge, this is the first convergence proof of PnP-PGD under prior mismatch. Compared with the existing theoretical results for PnP algorithms, our new results removed the need for several restrictive and unverifiable assumptions.
CLSep 12, 2025
CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in ChinaGuixian Xu, Zeli Su, Ziyin Zhang et al.
Minority languages in China, such as Tibetan, Uyghur, and Traditional Mongolian, face significant challenges due to their unique writing systems, which differ from international standards. This discrepancy has led to a severe lack of relevant corpora, particularly for supervised tasks like headline generation. To address this gap, we introduce a novel dataset, Chinese Minority Headline Generation (CMHG), which includes 100,000 entries for Tibetan, and 50,000 entries each for Uyghur and Mongolian, specifically curated for headline generation tasks. Additionally, we propose a high-quality test set annotated by native speakers, designed to serve as a benchmark for future research in this domain. We hope this dataset will become a valuable resource for advancing headline generation in Chinese minority languages and contribute to the development of related benchmarks.
IVFeb 22, 2022
Fast Gradient Methods for Data-Consistent Local Super-Resolution of Medical ImagesJunqi Tang, Guixian Xu, Jinglai Li
In this work, we propose a new paradigm of iterative model-based reconstruction algorithms for providing real-time solution for zooming-in and refining a region of interest in medical and clinical tomographic images. This algorithmic framework is tailored for a clinical need in medical imaging practice that after a reconstruction of the full tomographic image, the clinician may believe that some critical parts of the image are not clear enough, and may wish to see clearer these regions of interest. A naive approach (which is highly not recommended) would be to perform the global reconstruction of a higher resolution image, which has two major limitations: first, it is computationally inefficient, and second, the image regularization is still applied globally, which may over-smooth some local regions. Furthermore, if one wishes to fine-tune the regularization parameter for local parts, it would be computationally infeasible in practice for the case of using global reconstruction. Our new iterative approaches for such tasks are based on jointly utilizing the measurement information, efficient up-sampling/down-sampling across image spaces, and locally adjusted image prior for efficient and high-quality post-processing. The numerical results in low-dose X-ray CT image local zoom-in demonstrate the effectiveness of our approach.