ROFeb 4, 2023
TrajMatch: Towards Automatic Spatio-temporal Calibration for Roadside LiDARs through Trajectory MatchingHaojie Ren, Sha Zhang, Sugang Li et al.
Recently, it has become popular to deploy sensors such as LiDARs on the roadside to monitor the passing traffic and assist autonomous vehicle perception. Unlike autonomous vehicle systems, roadside sensors are usually affiliated with different subsystems and lack synchronization both in time and space. Calibration is a key technology which allows the central server to fuse the data generated by different location infrastructures, which can deliver improve the sensing range and detection robustness. Unfortunately, existing calibration algorithms often assume that the LiDARs are significantly overlapped or that the temporal calibration is already achieved. Since these assumptions do not always hold in the real world, the calibration results from the existing algorithms are often unsatisfactory and always need human involvement, which brings high labor costs. In this paper, we propose TrajMatch -- the first system that can automatically calibrate for roadside LiDARs in both time and space. The main idea is to automatically calibrate the sensors based on the result of the detection/tracking task instead of extracting special features. More deeply, we propose a mechanism for evaluating calibration parameters that is consistent with our algorithm, and we demonstrate the effectiveness of this scheme experimentally, which can also be used to guide parameter iterations for multiple calibration. Finally, to evaluate the performance of TrajMatch , we collect two dataset, one simulated dataset LiDARnet-sim 1.0 and a real-world dataset. Experiment results show that TrajMatch can achieve a spatial calibration error of less than 10cm and a temporal calibration error of less than 1.5ms.
85.0ROMay 1
Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot PoliciesYi Wang, Xinchen Li, Pengwei Xie et al.
Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capture. We present Learning While Deploying (LWD), a fleet-scale offline-to-online reinforcement learning framework for continual post-training of generalist Vision-Language-Action (VLA) policies. Starting from a pretrained VLA policy, LWD closes the loop between deployment, shared physical experience, policy improvement, and redeployment by using autonomous rollouts and human interventions collected across a robot fleet. To stabilize learning from heterogeneous, sparse-reward fleet data, LWD combines Distributional Implicit Value Learning (DIVL) for robust value estimation with Q-learning via Adjoint Matching (QAM) for policy extraction in flow-based VLA action generators. We validate LWD on a fleet of 16 dual-arm robots across eight real-world manipulation tasks, including semantic grocery restocking and 3--5 minute long-horizon tasks. A single generalist policy improves as fleet experience accumulates, reaching an average success rate of 95%, with the largest gains on long-horizon tasks.
CLFeb 4
Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language ModelsYu Zhang, Xinchen Li, Jialei Zhou et al.
Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking. However, existing block-wise decoding methods typically partition blocks in a rigid and fixed manner, which inevitably fragments complete semantic or syntactic constituents, leading to suboptimal performance. Inspired by the entropy reduction hypothesis (ERH), we recognize that constituent boundaries offer greater opportunities for uncertainty reduction, which motivates us to employ entropy analysis for identifying constituent boundaries. Therefore, we propose Swordsman, an entropy-driven adaptive block-wise decoding framework for DLMs. Swordsman adaptively partitions blocks by identifying entropy shifts between adjacent tokens to better align with semantic or syntactic constituent boundaries. In addition, Swordsman dynamically adjusts unmasking thresholds conditioned on the real-time unmasking status within a block, further improving both efficiency and stability. As a training-free framework, supported by KV Cache, Swordsman demonstrates state-of-the-art performance across extensive evaluations.
RODec 18, 2020Code
Simulation Environment for Safety Assessment of CEAV Deployment in LindenLevent Guvenc, Bilin Aksun-Guvenc, Xinchen Li et al.
This report presents a simulation environment for pre-deployment testing of the autonomous shuttles that will operate in the Linden Residential Area. An autonomous shuttle deployment was already successfully launched and operated in the city of Columbus and ended recently. This report focuses on the second autonomous shuttle deployment planned to start in December, 2019, using a route that will help to solve first-mile / last-mile mobility challenges in the Linden neighborhood of Columbus by providing free rides between St. Stephens Community House, Douglas Community Recreation Center, Rosewind Resident Council and Linden Transit Center. This document presents simulation testing environments in two open source simulators and a commercial simulator for this residential area route and how they can be used for model-in-the-loop and hardware-in-the-loop simulation testing of autonomous shuttle operation before the actual deployment.
65.3AIApr 30
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI ScientistsYujun Wu, Dongxu Zhang, Xinchen Li et al.
Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.
CVMay 25, 2025
Enhancing Text-to-Image Diffusion Transformer via Split-Text ConditioningYu Zhang, Jialei Zhou, Xinchen Li et al.
Current text-to-image diffusion generation typically employs complete-text conditioning. Due to the intricate syntax, diffusion transformers (DiTs) inherently suffer from a comprehension defect of complete-text captions. One-fly complete-text input either overlooks critical semantic details or causes semantic confusion by simultaneously modeling diverse semantic primitive types. To mitigate this defect of DiTs, we propose a novel split-text conditioning framework named DiT-ST. This framework converts a complete-text caption into a split-text caption, a collection of simplified sentences, to explicitly express various semantic primitives and their interconnections. The split-text caption is then injected into different denoising stages of DiT-ST in a hierarchical and incremental manner. Specifically, DiT-ST leverages Large Language Models to parse captions, extracting diverse primitives and hierarchically sorting out and constructing these primitives into a split-text input. Moreover, we partition the diffusion denoising process according to its differential sensitivities to diverse semantic primitive types and determine the appropriate timesteps to incrementally inject tokens of diverse semantic primitive types into input tokens via cross-attention. In this way, DiT-ST enhances the representation learning of specific semantic primitive types across different stages. Extensive experiments validate the effectiveness of our proposed DiT-ST in mitigating the complete-text comprehension defect.