CLDec 22, 2023Code
YAYI 2: Multilingual Open-Source Large Language ModelsYin Luo, Qingchao Kong, Nan Xu et al.
As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models.
88.2CVApr 23
S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-ImagesQingxiao Li, Lifeng Xu, QingLi Wang et al.
We present S1-VL, a multimodal reasoning model for scientific domains that natively supports two complementary reasoning paradigms: Scientific Reasoning, which relies on structured chain-of-thought, and Thinking-with-Images, which enables the model to actively manipulate images through Python code execution during reasoning. In the Thinking-with-Images mode, the model generates and executes image-processing code in a sandbox environment, obtains intermediate visual results, and continues reasoning in a multi-turn iterative manner. This design is particularly effective for challenging scenarios such as high-resolution scientific chart interpretation, microscopic image understanding, and geometry-assisted reasoning. To construct the training data, we collect scientific multimodal datasets spanning six disciplines: mathematics, physics, chemistry, astronomy, geography, and biology. We further develop a six-dimensional quality filtering framework for reasoning trajectories. To mitigate redundant, ineffective, and erroneous visual operations commonly found in existing datasets, we propose a multi-stage filtering pipeline together with an adaptive data routing strategy. This strategy converts samples with low visual information gain into pure Reasoning-mode data, enabling the model to learn when image operations are truly necessary. S1-VL is trained through a four-stage progressive pipeline: scientific multimodal SFT, Thinking-with-Images cold-start SFT, and two stages of reinforcement learning with SAPO. We build S1-VL-32B on top of Qwen3-VL-32B-Thinking and evaluate it on 13 benchmarks. Experimental results show that S1-VL-32B achieves state-of-the-art performance on all five Thinking-with-Images benchmarks, including HRBench-4K, HRBench-8K, MME-RealWorld-CN, MME-RealWorld-Lite, and V*, and outperforms compared systems on scientific reasoning benchmarks such as Physics and VRSBench.
CLDec 12, 2019
Improving Interpretability of Word Embeddings by Generating Definition and UsageHaitong Zhang, Yongping Du, Jiaxin Sun et al.
Word embeddings are substantially successful in capturing semantic relations among words. However, these lexical semantics are difficult to be interpreted. Definition modeling provides a more intuitive way to evaluate embeddings by utilizing them to generate natural language definitions of corresponding words. This task is of great significance for practical application and in-depth understanding of word representations. We propose a novel framework for definition modeling, which can generate reasonable and understandable context-dependent definitions. Moreover, we introduce usage modeling and study whether it is possible to utilize embeddings to generate example sentences of words. These ways are a more direct and explicit expression of embedding's semantics for better interpretability. We extend the single task model to multi-task setting and investigate several joint multi-task models to combine usage modeling and definition modeling together. Experimental results on existing Oxford dataset and a new collected Oxford-2019 dataset show that our single-task model achieves the state-of-the-art result in definition modeling and the multi-task learning methods are helpful for two tasks to improve the performance.