Back to Explore
cs.CVComputer Science

Computer Vision

Image recognition, object detection, visual understanding

90CVMay 28, 2025Code
Thinking with Generated Images

Ethan Chern, Zhulin Hu, Steffi Chern et al.

This approach enables AI models to engage in visual imagination and iterative refinement, benefiting domains like biochemistry, architecture, forensics, and sports, though it is a new paradigm rather than incremental.

85CVDec 12, 2023Code
LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Hao Shao, Yuxuan Hu, Letian Wang et al. · tsinghua, utoronto

This work addresses the challenge of improving autonomous driving safety and human interaction in complex urban scenarios, representing a novel approach rather than an incremental improvement.

85CVMar 3, 2025Code
Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang et al. · pku

This work addresses the challenge of data-efficient fine-tuning for large vision-language models in domain-specific tasks, representing a paradigm shift.

85CVMar 1, 2024Code
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Xiangxiang Chu, Jianlin Su, Bo Zhang et al.

This provides a unified and generic modeling framework for most vision tasks, potentially serving as a strong new baseline for vision generation and understanding.