CVMay 21, 2025Code
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement LearningHaozhe Wang, Alex Su, Weiming Ren et al.
Chain-of-thought reasoning has significantly improved the performance of Large Language Models (LLMs) across various domains. However, this reasoning process has been confined exclusively to textual space, limiting its effectiveness in visually intensive tasks. To address this limitation, we introduce the concept of reasoning in the pixel-space. Within this novel framework, Vision-Language Models (VLMs) are equipped with a suite of visual reasoning operations, such as zoom-in and select-frame. These operations enable VLMs to directly inspect, interrogate, and infer from visual evidences, thereby enhancing reasoning fidelity for visual tasks. Cultivating such pixel-space reasoning capabilities in VLMs presents notable challenges, including the model's initially imbalanced competence and its reluctance to adopt the newly introduced pixel-space operations. We address these challenges through a two-phase training approach. The first phase employs instruction tuning on synthesized reasoning traces to familiarize the model with the novel visual operations. Following this, a reinforcement learning (RL) phase leverages a curiosity-driven reward scheme to balance exploration between pixel-space reasoning and textual reasoning. With these visual operations, VLMs can interact with complex visual inputs, such as information-rich images or videos to proactively gather necessary information. We demonstrate that this approach significantly improves VLM performance across diverse visual reasoning benchmarks. Our 7B model, \model, achieves 84\% on V* bench, 74\% on TallyQA-Complex, and 84\% on InfographicsVQA, marking the highest accuracy achieved by any open-source model to date. These results highlight the importance of pixel-space reasoning and the effectiveness of our framework.
AISep 1, 2025Code
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool UseDongfu Jiang, Yi Lu, Zhuofeng Li et al. · utoronto
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2$\times$ speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.
AINov 30, 2024
Unified Parameter-Efficient Unlearning for LLMsChenlu Ding, Jiancan Wu, Yancheng Yuan et al.
The advent of Large Language Models (LLMs) has revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks. Fine-tuning these models for specific domains, particularly through Parameter-Efficient Fine-Tuning (PEFT) strategies like LoRA, has become a prevalent practice due to its efficiency. However, this raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information. To address these issues, we introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise parameter adjustments using influence functions. Unlike traditional unlearning techniques that are often limited in scope and require extensive retraining, LLMEraser is designed to handle a broad spectrum of unlearning tasks without compromising model performance. Extensive experiments on benchmark datasets demonstrate that LLMEraser excels in efficiently managing various unlearning scenarios while maintaining the overall integrity and efficacy of the models.
CRDec 14, 2018
ARPA WhitepaperDerek Zhang, Alex Su, Felix Xu et al.
We propose a secure computation solution for blockchain networks. The correctness of computation is verifiable even under malicious majority condition using information-theoretic Message Authentication Code (MAC), and the privacy is preserved using Secret-Sharing. With state-of-the-art multiparty computation protocol and a layer2 solution, our privacy-preserving computation guarantees data security on blockchain, cryptographically, while reducing the heavy-lifting computation job to a few nodes. This breakthrough has several implications on the future of decentralized networks. First, secure computation can be used to support Private Smart Contracts, where consensus is reached without exposing the information in the public contract. Second, it enables data to be shared and used in trustless network, without disclosing the raw data during data-at-use, where data ownership and data usage is safely separated. Last but not least, computation and verification processes are separated, which can be perceived as computational sharding, this effectively makes the transaction processing speed linear to the number of participating nodes. Our objective is to deploy our secure computation network as an layer2 solution to any blockchain system. Smart Contracts\cite{smartcontract} will be used as bridge to link the blockchain and computation networks. Additionally, they will be used as verifier to ensure that outsourced computation is completed correctly. In order to achieve this, we first develop a general MPC network with advanced features, such as: 1) Secure Computation, 2) Off-chain Computation, 3) Verifiable Computation, and 4)Support dApps' needs like privacy-preserving data exchange.