Xiujin Liu

CV
5papers
2citations
Novelty57%
AI Score51

5 Papers

91.3CLMay 23
SEAL: Synergistic Co-Evolution of Agents and Learning Environments

Yihao Hu, Zhihao Wen, Xiujin Liu et al.

Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evolution methods adapt either the policy or the learning environment in isolation. We identify this structural gap as \emph{Agent-Environment Misalignment}: the agent's capability frontier changes during training, while the environment that provides supervision remains static or only weakly coupled to the agent's revealed failures. We propose SEAL, a closed-loop co-evolution framework for interactive tool-use agents. SEAL collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both environment-side adaptation and model-side policy optimization. The environment evolves its training-time learning interface by exposing clearer tool affordance cues, constraint information, and recovery-oriented feedback, while the policy is updated with diagnosis-guided advantage reweighting. Extensive experiments across in-distribution and out-of-distribution multi-turn tool-use evaluations show that SEAL improves low-resource agent learning: with only 400 training samples, it yields +8.25 to +26.25 average-point gains across three backbones and exhibits positive out-of-distribution transfer. These results demonstrate the value of jointly adapting the learner and its training-time learning substrate for robust self-improving LLM agents.

91.7CVMay 18
AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

Pan Wang, Yihao Hu, Xiujin Liu et al.

Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatial decision making: geometric priors are compressed into lossy language, and sparse interaction is often supervised through delayed textual feedback rather than dense visually grounded signals. We argue that reusable experience for VLM agents should remain visually grounded. Based on this insight, we propose \textbf{AtlasVA}, a teacher-free visual skill memory framework that organizes memory into three complementary layers: spatial heatmaps, visual exemplars, and symbolic text skills. AtlasVA further evolves danger and affinity atlases directly from trajectory statistics and lightweight grid heuristics, and reuses these self-evolving atlases as potential-based shaping rewards for reinforcement learning. This unifies perception, memory, and optimization without external LLM supervision. Experiments on \textsc{Sokoban}, \textsc{FrozenLake}, 3D embodied navigation, and 3D robotic manipulation benchmarks show that AtlasVA consistently outperforms text-centric memory baselines and competitive VLM agents, with especially strong gains on spatially intensive tasks. Homepage: https://wangpan-ustc.github.io/AtlasvaWeb

51.5CVApr 20
CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement

Pan Wang, Yihao Hu, Xiujin Liu et al.

Traditional shadow removal networks often treat image restoration as an unconstrained mapping, lacking the physical interpretability required to balance localized texture recovery with global illumination consistency. To address this, we propose CFSR, a multi-modal prior-driven framework that reframes shadow removal as a physics-constrained restoration process. By seamlessly integrating 3D geometric cues with large-scale foundation model semantics, CFSR effectively bridges the 2D-3D domain gap. Specifically, we first map observations into a custom HVI color space to suppress shadow-induced noise and robustly fuse RGB data with estimated depth priors. At its core, our Geometric & Semantic Dual Explicit Guided Attention mechanism utilizes DINO features and 3D surface normals to directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints. To recover severely degraded regions, we inject holistic priors via a frozen CLIP encoder. Finally, our Frequency Collaborative Reconstruction Module (FCRM) achieves an optimal synthesis by decoupling the decoding process. Conditioned on geometric priors, FCRM seamlessly harmonizes the reconstruction of sharp high-frequency occlusion boundaries with the restoration of low-frequency global illumination. Extensive experiments demonstrate that CFSR achieves state-of-the-art performance across multiple challenging benchmarks.

36.2CEApr 28
Modular Photobioreactor Facade Systems for Sustainable Architecture -- A case study

Xiujin Liu

This paper proposes an innovative solution to the growing issue of greenhouse gas emissions: a closed photobioreactor (PBR) facade system to mitigate greenhouse gas (GHG) concentrations. With digital fabrication technology, this study explores the transition from traditional, single function building facades to multifunctional, integrated building systems. It introduces a photobioreactor facade system to mitigate greenhouse gas (GHG) concentrations while addressing the challenge of large-scale prefabricated components transportation. This research introduces a novel approach by designing the facade system as modular, user-friendly and transportation-friendly bricks, enabling the creation of a user-customized and self-assembled photobioreactor system. The single module in the system is proposed to be "neutralization bricks", which embedded with algae and equipped with an air circulation system, facilitating the photobioreactor's functionality. A connection system between modules allows for easy assembly by users, while a limited variety of brick styles ensures modularity in manufacturing without sacrificing customization and diversity. The system is also equipped with an advanced microalgae status detection algorithm, which allows users to monitor the condition of the microalgae using monocular camera. This functionality ensures timely alerts and notifications for users to replace the algae, thereby optimizing the operational efficiency and sustainability of the algae cultivation process.

CVDec 6, 2025
GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation

Xiujin Liu

We present GNC-Pose, a fully learning-free monocular 6D object pose estimation pipeline for textured objects that combines rendering-based initialization, geometry-aware correspondence weighting, and robust GNC optimization. Starting from coarse 2D-3D correspondences obtained through feature matching and rendering-based alignment, our method builds upon the Graduated Non-Convexity (GNC) principle and introduces a geometry-aware, cluster-based weighting mechanism that assigns robust per point confidence based on the 3D structural consistency of the model. This geometric prior and weighting strategy significantly stabilizes the optimization under severe outlier contamination. A final LM refinement further improve accuracy. We tested GNC-Pose on The YCB Object and Model Set, despite requiring no learned features, training data, or category-specific priors, GNC-Pose achieves competitive accuracy compared with both learning-based and learning-free methods, and offers a simple, robust, and practical solution for learning-free 6D pose estimation.