Xuemin Yu

CL
h-index37
7papers
113citations
Novelty39%
AI Score45

7 Papers

CLFeb 14, 2024
Long-form evaluation of model editing

Domenic Rosati, Robie Gonzales, Jinkun Chen et al.

Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (LEME) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.

CLApr 18, 2024
Latent Concept-based Explanation of NLP Models

Xuemin Yu, Fahim Dalvi, Nadir Durrani et al.

Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the training latent space, allowing it to provide latent context-based explanations of the prediction.

LGJun 24, 2025
Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Ankur Garg, Xuemin Yu, Hassan Sajjad et al.

Uncovering emergent concepts across transformer layers remains a significant challenge because the residual stream linearly mixes and duplicates information, obscuring how features evolve within large language models. Current research efforts primarily inspect neural representations at single layers, thereby overlooking this cross-layer superposition and the redundancy it introduces. These representations are typically either analyzed directly for activation patterns or passed to probing classifiers that map them to a limited set of predefined concepts. To address these limitations, we propose cross-layer VQ-VAE (CLVQ-VAE), a framework that uses vector quantization to map representations across layers and in the process collapse duplicated residual-stream features into compact, interpretable concept vectors. Our approach uniquely combines top-k temperature-based sampling during quantization with EMA codebook updates, providing controlled exploration of the discrete latent space while maintaining code-book diversity. We further enhance the framework with scaled-spherical k-means++ for codebook initialization, which clusters by directional similarity rather than magnitude, better aligning with semantic structure in word embedding space.

LGFeb 2
Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Xuemin Yu, Ankur Garg, Samira Ebrahimi Kahou et al.

Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.

MAOct 15, 2025
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations

Jinkun Chen, Sher Badshah, Xuemin Yu et al.

What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.

ROJul 21, 2018
Design and Implementation of Global Path Planning System for Unmanned Surface Vehicle among Multiple Task Points

Yanlong Wang, Xuemin Yu, Xu Liang

Global path planning is the key technology in the design of unmanned surface vehicles. This paper establishes global environment modelling based on electronic charts and hexagonal grids which are proved to be better than square grids in validity, safety and rapidity. Besides, we introduce Cube coordinate system to simplify hexagonal algorithms. Furthermore, we propose an improved A* algorithm to realize the path planning between two points. Based on that, we build the global path planning modelling for multiple task points and present an improved ant colony optimization to realize it accurately. The simulation results show that the global path planning system can plan an optimal path to tour multiple task points safely and quickly, which is superior to traditional methods in safety, rapidity and path length. Besides, the planned path can directly apply to actual applications of USVs.

ROFeb 9, 2018
Research and Implementation of Global Path Planning for Unmanned Surface Vehicle Based on Electronic Chart

Yanlong Wang, Xu Liang, Baoan Li et al.

Unmanned Surface Vehicle (USV) is a new type of intelligent surface craft, and global path planning is the key technology of USV research, which can reflect the intelligent level of USV. In order to solve the problem of global path planning of USV, this paper proposes an improved A* algorithm for sailing cost optimization based on electronic charts. This paper uses the S-57 electronic chart to realize the establishment of the octree grid environment model, and proposes an improved A* algorithm based on sailing safety weight, pilot quantity and path curve smoothing to ensure the safety of the route, reduce the planning time, and improve path smoothness. The simulation results show that the environmental model construction method and the improved A* algorithm can generate safe and reasonable global path.