Liangxuan Guo

AI
h-index4
6papers
14citations
Novelty53%
AI Score42

6 Papers

LGJun 1, 2023Code
Out-of-distribution forgetting: vulnerability of continual learning to intra-class distribution shift

Liangxuan Guo, Yang Chen, Shan Yu

Continual learning (CL) is an important technique to allow artificial neural networks to work in open environments. CL enables a system to learn new tasks without severe interference to its performance on old tasks, i.e., overcome the problems of catastrophic forgetting. In joint learning, it is well known that the out-of-distribution (OOD) problem caused by intentional attacks or environmental perturbations will severely impair the ability of networks to generalize. In this work, we reported a special form of catastrophic forgetting raised by the OOD problem in continual learning settings, and we named it out-of-distribution forgetting (OODF). In continual image classification tasks, we found that for a given category, introducing an intra-class distribution shift significantly impaired the recognition accuracy of CL methods for that category during subsequent learning. Interestingly, this phenomenon is special for CL as the same level of distribution shift had only negligible effects in the joint learning scenario. We verified that CL methods without dedicating subnetworks for individual tasks are all vulnerable to OODF. Moreover, OODF does not depend on any specific way of shifting the distribution, suggesting it is a risk for CL in a wide range of circumstances. Taken together, our work identified an under-attended risk during CL, highlighting the importance of developing approaches that can overcome OODF. Code available: \url{https://github.com/Hiroid/OODF}

AIApr 13, 2023
Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

Yang Chen, Liangxuan Guo, Shan Yu

The capacity to generate meaningful symbols and effectively employ them for advanced cognitive processes, such as communication, reasoning, and planning, constitutes a fundamental and distinctive aspect of human intelligence. Existing deep neural networks still notably lag human capabilities in terms of generating symbols for higher cognitive functions. Here, we propose a solution (symbol emergence artificial network (SEA-net)) to endow neural networks with the ability to create symbols, understand semantics, and achieve communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that allows the system to acquire new functions purely by symbolic manipulation or communication. In addition, these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We believe that the proposed framework will be instrumental in producing more capable systems that can synergize the strengths of connectionist and symbolic approaches for artificial intelligence (AI).

NCJan 5
A neural network for modeling human concept formation, understanding and communication

Liangxuan Guo, Haoyang Chen, Yang Chen et al.

A remarkable capability of the human brain is to form more abstract conceptual representations from sensorimotor experiences and flexibly apply them independent of direct sensory inputs. However, the computational mechanism underlying this ability remains poorly understood. Here, we present a dual-module neural network framework, the CATS Net, to bridge this gap. Our model consists of a concept-abstraction module that extracts low-dimensional conceptual representations, and a task-solving module that performs visual judgement tasks under the hierarchical gating control of the formed concepts. The system develops transferable semantic structure based on concept representations that enable cross-network knowledge transfer through conceptual communication. Model-brain fitting analyses reveal that these emergent concept spaces align with both neurocognitive semantic model and brain response structures in the human ventral occipitotemporal cortex, while the gating mechanisms mirror that in the semantic control brain network. This work establishes a unified computational framework that can offer mechanistic insights for understanding human conceptual cognition and engineering artificial systems with human-like conceptual intelligence.

AISep 14, 2025
Agentic Lybic: Multi-Agent Execution System with Tiered Reasoning and Orchestration

Liangxuan Guo, Bin Zhu, Qingqian Tao et al.

Autonomous agents for desktop automation struggle with complex multi-step tasks due to poor coordination and inadequate quality control. We introduce Agentic Lybic, a novel multi-agent system where the entire architecture operates as a finite-state machine (FSM). This core innovation enables dynamic orchestration. Our system comprises four components: a Controller, a Manager, three Workers (Technician for code-based operations, Operator for GUI interactions, and Analyst for decision support), and an Evaluator. The critical mechanism is the FSM-based routing between these components, which provides flexibility and generalization by dynamically selecting the optimal execution strategy for each subtask. This principled orchestration, combined with robust quality gating, enables adaptive replanning and error recovery. Evaluated officially on the OSWorld benchmark, Agentic Lybic achieves a state-of-the-art 57.07% success rate in 50 steps, substantially outperforming existing methods. Results demonstrate that principled multi-agent orchestration with continuous quality control provides superior reliability for generalized desktop automation in complex computing environments.

CVFeb 8, 2025
Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

Yuhan Zhang, Guoqing Ma, Guangfu Hao et al.

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoders used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder -- a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency using only visual inputs during learning, approaching the efficiency of state-based method in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.

CVMay 28, 2025
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language

Guangfu Hao, Haojie Wen, Liangxuan Guo et al.

Flexible tool selection reflects a complex cognitive ability that distinguishes humans from other species, yet computational models that capture this ability remain underdeveloped. We developed a framework using low-dimensional attribute representations to bridge visual tool perception and linguistic task understanding. We constructed a comprehensive dataset (ToolNet) containing 115 common tools labeled with 13 carefully designed attributes spanning physical, functional, and psychological properties, paired with natural language scenarios describing tool usage. Visual encoders (ResNet or ViT) extract attributes from tool images while fine-tuned language models (GPT-2, LLaMA, DeepSeek) derive required attributes from task descriptions. Our approach achieves 74% accuracy in tool selection tasks-significantly outperforming direct tool matching (20%) and smaller multimodal models (21%-58%), while approaching performance of much larger models like GPT-4o (73%) with substantially fewer parameters. Human evaluation studies validate our framework's alignment with human decision-making patterns, and generalization experiments demonstrate effective performance on novel tool categories. Ablation studies revealed that manipulation-related attributes (graspability, elongation, hand-relatedness) consistently prove most critical across modalities. This work provides a parameter-efficient, interpretable solution that mimics human-like tool cognition, advancing both cognitive science understanding and practical applications in tool selection tasks.