cs.AIComputer Science

Artificial Intelligence

AI systems, knowledge representation, planning

32.8CLApr 3Code

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Aichen Cai, Anmeng Zhang, Anyu Li et al.

This work addresses efficiency challenges for users of mid-scale LLMs, though it appears incremental with novel components like FiberPO and architectural optimizations.

35.5CVApr 22

Image Generators are Generalist Vision Learners

Valentin Gabeur, Shangbang Long, Songyou Peng et al.

This work suggests a potential paradigm shift in computer vision by positioning generative pretraining as a foundational approach for building generalist vision models that unify generation and understanding tasks.

36.7CRMar 16Code

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · eth-zurich

This addresses a critical security threat for users of AI agents in high-stakes settings, revealing fundamental weaknesses in current models.

35.9AIMar 19

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim et al. · meta-ai

This work addresses the challenge of automated assessment for mathematical reasoning in STEM fields, offering incremental improvements through enhanced training methods.

51.1CVMar 28Code11k

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu et al.

For researchers and practitioners in computer vision, SAM 3 provides a more accurate and unified model for concept-driven segmentation and tracking, with a new benchmark and dataset.

29.0AIMar 11Code

Mind the Sim2Real Gap in User Simulation for Agentic Tasks

Xuhui Zhou, Weiwei Sun, Qianou Ma et al. · cmu

This work addresses the critical issue of inaccurate user simulation in NLP agent evaluation, which can mislead development, and is incremental in providing empirical validation and a new metric.

33.4AIMar 16

CUBE: A Standard for Unifying Agent Benchmarks

Alexandre Lacoste, Nicolas Gontier, Oleh Shliazhko et al. · ibm-research

This addresses a critical productivity issue for AI researchers by standardizing benchmark integration to prevent further fragmentation as new benchmarks emerge.

28.5CVMar 17

Demystifing Video Reasoning

Ruisi Wang, Zhongang Cai, Fanyi Pu et al.

This provides a systematic understanding of reasoning emergence in video generation models, potentially guiding future research to exploit these dynamics for AI intelligence.

29.3CRMar 12

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al.

This addresses security risks for users and developers of autonomous LLM agents, but it is incremental as it builds on existing threat analysis frameworks.

32.3SEMar 17Code

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu et al.

This addresses performance gaps in industrial code intelligence for domains like chip design and embedded systems, though it appears incremental as it builds on existing foundation model approaches.

31.9LGMar 16Code24

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Shubham Parashar, Shurui Gui, Xiner Li et al.

This addresses the challenge of inefficient reasoning improvement in small LLMs for mathematical and coding tasks, representing an incremental advancement in RL-based training methods.

27.3AIMar 16Code765

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Yuwen Du, Rui Ye, Shuo Tang et al.

This work democratizes frontier search agent research for the broader AI community by providing open-source data and models, addressing a bottleneck previously dominated by industrial giants.

28.2CVMar 13

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Yichen Zhang, Da Peng, Zonghao Guo et al.

This addresses the challenge of mismatched decoding regimes and representations in multimodal AI, offering an efficient solution for unified tasks, though it appears incremental in building on existing UMM approaches.

28.8AIMar 10Code1.4k

Logics-Parsing-Omni Technical Report

Xin An, Jingyi Cai, Xiangyang Chen et al.

This work addresses multimodal parsing challenges for AI systems handling documents, images, and audio-visual data, representing a novel method for a known bottleneck.

25.4CVMar 20Code56

PEARL: Personalized Streaming Video Understanding Model

Yuanhong Zheng, Ruichuan An, Xiaopeng Lin et al.

This addresses the limitation of current personalization methods to static/offline data for future AI assistants, though it is incremental as it builds on existing vision-language models.

29.8AIMar 19

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Wenxuan Zhang, Lemeng Wu, Changsheng Zhao et al.

This work addresses the problem of efficient policy optimization for diffusion-based language models, offering incremental improvements in training and generation efficiency for AI researchers and practitioners.

27.6CVMar 16

MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model

Jinguang Tong, Jinbo Wu, Kaisiyuan Wang et al.

This work addresses a frontier in expressive digital human creation for applications like animation and virtual reality, representing a novel method for a known bottleneck rather than an incremental improvement.

29.2SEMar 13

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Gangda Deng, Zhaoling Chen, Zhongming Yu et al.

This addresses the need for benchmarks that assess AI agents in dynamic, real-world software environments, which is incremental as it builds on existing evaluation methods.

29.3LGMay 2Code

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

Fang Wu, Haokai Zhao, Da Xing et al.

This work addresses the underexplored problem of noise valuation for diffusion model training, offering a method to improve training efficiency and generation quality.

26.6AIMar 19Code

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

Hao Zhang, Mingjie Liu, Shaokun Zhang et al.

This work addresses the problem of infrastructure inefficiency for researchers and developers training multi-turn LLM agents, though it is incremental as it focuses on improving existing rollout orchestration methods.