Kenan Li

CV
h-index4
8papers
128citations
Novelty50%
AI Score55

8 Papers

CVJun 15, 2023
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

Yiming Li, Sihang Li, Xinhao Liu et al.

Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field.

NANov 21, 2017
Some Extensions of the Crouzeix-Palencia Result

Trevor Caldwell, Anne Greenbaum, Kenan Li

In [{\em The Numerical Range is a $(1 + \sqrt{2})$-Spectral Set}, SIAM J. Matrix Anal. Appl. 38 (2017), pp.~649-655], Crouzeix and Palencia show that the numerical range of a square matrix or linear operator $A$ is a $(1 + \sqrt{2})$-spectral set for $A$; that is, for any function $f$ analytic in the interior of the numerical range $W(A)$ and continuous on its boundary, the inequality $\| f(A) \| \leq (1 + \sqrt{2} ) \| f \|_{W(A)}$ holds, where the norm on the left is the operator 2-norm and $\| f \|_{W(A)}$ on the right denotes the supremum of $| f(z) |$ over $z \in W(A)$. In this paper, we show how the arguments in their paper can be extended to show that other regions in the complex plane that do {\em not} necessarily contain $W(A)$ are $K$-spectral sets for a value of $K$ that may be close to $1 + \sqrt{2}$. We also find some special cases in which the constant $(1 + \sqrt{2})$ for $W(A)$ can be replaced by $2$, which is the value conjectured by Crouzeix.

40.2SEApr 28Code
SWE-Edit: Rethinking Code Editing for Efficient SWE-Agent

Yikai Zhang, Jiaxin Pei, Kenan Li et al.

Large language model agents have achieved remarkable progress on software engineering tasks, yet current approaches suffer from a fundamental context coupling problem: the standard code editing interface conflates code inspection, modification planning, and edit execution within a single context window, forcing agents to interleave exploratory viewing with strictly formatted edit generation. This causes irrelevant information to accumulate and degrades agent performance. To address this, we propose SWE-Edit, which decomposes code editing into two specialized subagents: a Viewer that extracts task-relevant code on demand, and an Editor that executes modifications from high-level plans--allowing the main agent to focus on reasoning while delegating context-intensive operations to clean context windows. We further investigate what makes an effective editing model: observing that the prevalent find-and-replace format is error-prone, we train Qwen3-8B with GRPO to adaptively select editing modes, yielding improved editing efficiency over single-format baselines. On SWE-bench Verified, SWE-Edit improves resolved rate by 2.1% while reducing inference cost by 17.9%. We additionally propose a code editing benchmark that reliably predicts downstream agentic performance, providing practical guidance for editing model selection. Our code is publicly available at https://github.com/microsoft/SWE-Edit.

19.8CVMar 28
Complet4R: Geometric Complete 4D Reconstruction

Weibang Wang, Kenan Li, Zhuoguang Chen et al.

We introduce Complet4R, a novel end-to-end framework for Geometric Complete 4D Reconstruction, which aims to recover temporally coherent and geometrically complete reconstruction for dynamic scenes. Our method formalizes the task of Geometric Complete 4D Reconstruction as a unified framework of reconstruction and completion, by directly accumulating full contexts onto each frame. Unlike previous approaches that rely on pairwise reconstruction or local motion estimation, Complet4R utilizes a decoder-only transformer to operate all context globally directly from sequential video input, reconstructing a complete geometry for every single timestamp, including occluded regions visible in other frames. Our method demonstrates the state-of-the-art performance on our proposed benchmark for Geometric Complete 4D Reconstruction and the 3D Point Tracking task. Code will be released to support future research.

CVMar 11, 2025Code
TrackOcc: Camera-based 4D Panoptic Occupancy Tracking

Zhuoguang Chen, Kenan Li, Xiuyu Yang et al.

Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The source code will be released at https://github.com/Tsinghua-MARS-Lab/TrackOcc.

37.3MAApr 9
ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

Kenan Li, Qirui Jin, Liao Zhu et al.

Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.

SEMar 5
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform

Kenan Li, Rongzhi Li, Linghao Zhang et al.

Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems. To demonstrate its utility, we further propose a fully automated pipeline for SWE dataset creation, where task design is the only human intervention. RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs. Notably, several works on agentic benchmarking and training have recently adopted RepoLaunch for automated task generation.

CVSep 21, 2025
SLAM-Former: Putting SLAM into One Transformer

Yijun Yuan, Zhuoguang Chen, Kenan Li et al.

We present SLAM-Former, a novel neural approach that integrates full SLAM capabilities into a single transformer. Similar to traditional SLAM systems, SLAM-Former comprises both a frontend and a backend that operate in tandem. The frontend processes sequential monocular images in real-time for incremental mapping and tracking, while the backend performs global refinement to ensure a geometrically consistent result. This alternating execution allows the frontend and backend to mutually promote one another, enhancing overall system performance. Comprehensive experimental results demonstrate that SLAM-Former achieves superior or highly competitive performance compared to state-of-the-art dense SLAM methods.