cs.CLComputer Science

Computation & Language

NLP, text generation, language models

32.8CLApr 3Code

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Aichen Cai, Anmeng Zhang, Anyu Li et al.

This work addresses efficiency challenges for users of mid-scale LLMs, though it appears incremental with novel components like FiberPO and architectural optimizations.

24.2CLMar 16Code3.4k

Attention Residuals

Kimi Team, Guangyu Chen, Yu Zhang et al.

This addresses a key bottleneck in scaling deep neural networks for AI, offering a practical drop-in replacement to enhance model stability and performance, though it is incremental as it builds on existing residual connection paradigms.

35.9AIMar 19

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim et al. · meta-ai

This work addresses the challenge of automated assessment for mathematical reasoning in STEM fields, offering incremental improvements through enhanced training methods.

35.0CVMar 29Code464

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Meituan LongCat Team, Bin Xiao, Chao Wang et al.

This work provides a unified approach to multimodal understanding and generation for AI researchers, though it is incremental as it builds on existing NTP and tokenization methods.

27.6CVMar 10Code17

Video-Based Reward Modeling for Computer-Use Agents

Linxin Song, Jieyu Zhang, Huanxin Sheng et al.

This provides a scalable, model-agnostic evaluator for computer-using agents, addressing a key bottleneck in their development and deployment.

31.9LGMar 16Code24

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Shubham Parashar, Shurui Gui, Xiner Li et al.

This addresses the challenge of inefficient reasoning improvement in small LLMs for mathematical and coding tasks, representing an incremental advancement in RL-based training methods.

27.3AIMar 16Code765

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Yuwen Du, Rui Ye, Shuo Tang et al.

This work democratizes frontier search agent research for the broader AI community by providing open-source data and models, addressing a bottleneck previously dominated by industrial giants.

35.5CVMar 29Code431

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Zhiwen Fan, Jian Zhang, Renjie Li et al.

This work addresses the bottleneck of 3D spatial understanding in VLMs for monocular video inputs, offering a scalable solution for embodied AI and time-sensitive applications.

26.1LGMar 11Code33

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Teng Xiao, Yige Yuan, Hamish Ivison et al.

This addresses the challenge of inefficient exploration in search agents, offering a domain-specific incremental improvement.

19.2CLMar 20

DLLM Agent: See Farther, Run Faster

Huiling Zhen, Weizhe Lin, Renxi Liu et al.

This work addresses efficiency improvements for AI agents in planning and tool-use tasks, though it is incremental as it applies an existing method (diffusion) to a new domain (agent frameworks).

31.6SEApr 16

Scaling Test-Time Compute for Agentic Coding

Joongwon Kim, Wannan Yang, Kelvin Niu et al.

For developers of coding agents, this work addresses the bottleneck of scaling test-time compute for long-horizon tasks by focusing on representation and reuse of prior experience.

18.9CLMar 15

AI Can Learn Scientific Taste

Jingqi Tong, Mingzhe Li, Hangcheng Li et al.

This work addresses the underexplored challenge of improving AI's scientific taste for advancing toward human-level AI scientists, representing a novel but incremental step in AI research.

18.1CLApr 19Code9

HorizonBench: Long-Horizon Personalization with Evolving Preferences

Shuyue Stella Li, Bhargavi Paranjape, Kerem Oktar et al.

This work provides the first benchmark with ground-truth provenance for preference evolution over long horizons, enabling diagnosis of state-tracking failures in personalization systems.

25.0CVMar 12Code43

EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models

Xuanlang Dai, Yujie Zhou, Long Xing et al.

This work addresses limitations in diffusion models for complex spatial reasoning tasks, offering a novel framework that improves accuracy, though it appears incremental in enhancing existing methods.

18.0CLMar 14Code

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Arushi Goel, Sreyan Ghosh, Vatsal Agarwal et al.

This addresses the need for better evaluation of multimodal AI models in real-world video understanding, though it is incremental as it focuses on benchmarking rather than proposing new methods.

19.3CLMay 2Code50

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks

Benjamin Warner, Ratna Sagari Grandhi, Max Kieffer et al.

Provides a comprehensive, open-source evaluation suite to address benchmark saturation and data accessibility issues for medical LLM evaluation.

11.6CLApr 30Code6

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

Sihong Wu, Owen Jiang, Yilun Zhao et al.

For researchers and practitioners building automated peer review systems, this survey provides a structured overview of current methods and challenges.

19.3CLMar 17

Fanar 2.0: Arabic Generative AI Stack

FANAR TEAM, Ummar Abbas, Mohammad Shahmeer Ahmad et al.

This work addresses the problem of limited AI resources for Arabic language and culture, offering a competitive system for Arabic-speaking users, though it is incremental as it builds on existing models like Gemma-3-27B.

24.9AIMar 12Code239

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

Guanyu Jiang, Zhaochen Su, Xiaoye Qu et al.

This addresses the challenge of enabling multimodal agents to continually improve without parameter updates, which is incremental as it builds on existing learning-based approaches.

15.2CLMar 12

Tiny Aya: Bridging Scale and Multilingual Depth

Alejandro R. Salamanca, Diana Abagyan, Daniel D'souza et al. · microsoft-research

This provides an efficient and balanced alternative for multilingual AI deployment, benefiting users in diverse regions by addressing scale and depth issues.