IRApr 18, 2022
HRCF: Enhancing Collaborative Filtering via Hyperbolic Geometric RegularizationMenglin Yang, Min Zhou, Jiahong Liu et al.
In large-scale recommender systems, the user-item networks are generally scale-free or expand exponentially. The latent features (also known as embeddings) used to describe the user and item are determined by how well the embedding space fits the data distribution. Hyperbolic space offers a spacious room to learn embeddings with its negative curvature and metric properties, which can well fit data with tree-like structures. Recently, several hyperbolic approaches have been proposed to learn high-quality representations for the users and items. However, most of them concentrate on developing the hyperbolic similitude by designing appropriate projection operations, whereas many advantageous and exciting geometric properties of hyperbolic space have not been explicitly explored. For example, one of the most notable properties of hyperbolic space is that its capacity space increases exponentially with the radius, which indicates the area far away from the hyperbolic origin is much more embeddable. Regarding the geometric properties of hyperbolic space, we bring up a Hyperbolic Regularization powered Collaborative Filtering(HRCF) and design a geometric-aware hyperbolic regularizer. Specifically, the proposal boosts optimization procedure via the root alignment and origin-aware penalty, which is simple yet impressively effective. Through theoretical analysis, we further show that our proposal is able to tackle the over-smoothing problem caused by hyperbolic aggregation and also brings the models a better discriminative ability. We conduct extensive empirical analysis, comparing our proposal against a large set of baselines on several public benchmarks. The empirical results show that our approach achieves highly competitive performance and surpasses both the leading Euclidean and hyperbolic baselines by considerable margins.
IRJul 19, 2022
HICF: Hyperbolic Informative Collaborative FilteringMenglin Yang, Zhihao Li, Min Zhou et al.
Considering the prevalence of the power-law distribution in user-item networks, hyperbolic space has attracted considerable attention and achieved impressive performance in the recommender system recently. The advantage of hyperbolic recommendation lies in that its exponentially increasing capacity is well-suited to describe the power-law distributed user-item network whereas the Euclidean equivalent is deficient. Nonetheless, it remains unclear which kinds of items can be effectively recommended by the hyperbolic model and which cannot. To address the above concerns, we take the most basic recommendation technique, collaborative filtering, as a medium, to investigate the behaviors of hyperbolic and Euclidean recommendation models. The results reveal that (1) tail items get more emphasis in hyperbolic space than that in Euclidean space, but there is still ample room for improvement; (2) head items receive modest attention in hyperbolic space, which could be considerably improved; (3) and nonetheless, the hyperbolic models show more competitive performance than Euclidean models. Driven by the above observations, we design a novel learning method, named hyperbolic informative collaborative filtering (HICF), aiming to compensate for the recommendation effectiveness of the head item while at the same time improving the performance of the tail item. The main idea is to adapt the hyperbolic margin ranking learning, making its pull and push procedure geometric-aware, and providing informative guidance for the learning of both head and tail items. Extensive experiments back up the analytic findings and also show the effectiveness of the proposed method. The work is valuable for personalized recommendations since it reveals that the hyperbolic space facilitates modeling the tail item, which often represents user-customized preferences or new products.
97.8CLApr 18Code
HeLa-Mem: Hebbian Learning and Associative Memory for LLM AgentsJinchang Zhu, Jindong Li, Cheng Zhang et al.
Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to capture the associative structure of human memory, wherein related experiences progressively strengthen interconnections through repeated co-activation. Inspired by cognitive neuroscience, we identify three mechanisms central to biological memory: association, consolidation, and spreading activation, which remain largely absent in current research. To bridge this gap, we propose HeLa-Mem, a bio-inspired memory architecture that models memory as a dynamic graph with Hebbian learning dynamics. HeLa-Mem employs a dual-level organization: (1) an episodic memory graph that evolves through co-activation patterns, and (2) a semantic memory store populated via Hebbian Distillation, wherein a Reflective Agent identifies densely connected memory hubs and distills them into structured, reusable semantic knowledge. This dual-path design leverages both semantic similarity and learned associations, mirroring the episodic-semantic distinction in human cognition. Experiments on LoCoMo demonstrate superior performance across four question categories while using significantly fewer context tokens. Code is available on GitHub: https://github.com/ReinerBRO/HeLa-Mem
AIApr 27, 2022
Discovering Representative Attribute-stars via Minimum Description LengthJiahong Liu, Min Zhou, Philippe Fournier-Viger et al.
Graphs are a popular data type found in many domains. Numerous techniques have been proposed to find interesting patterns in graphs to help understand the data and support decision-making. However, there are generally two limitations that hinder their practical use: (1) they have multiple parameters that are hard to set but greatly influence results, (2) and they generally focus on identifying complex subgraphs while ignoring relationships between attributes of nodes.Graphs are a popular data type found in many domains. Numerous techniques have been proposed to find interesting patterns in graphs to help understand the data and support decision-making. However, there are generally two limitations that hinder their practical use: (1) they have multiple parameters that are hard to set but greatly influence results, (2) and they generally focus on identifying complex subgraphs while ignoring relationships between attributes of nodes. To address these problems, we propose a parameter-free algorithm named CSPM (Compressing Star Pattern Miner) which identifies star-shaped patterns that indicate strong correlations among attributes via the concept of conditional entropy and the minimum description length principle. Experiments performed on several benchmark datasets show that CSPM reveals insightful and interpretable patterns and is efficient in runtime. Moreover, quantitative evaluations on two real-world applications show that CSPM has broad applications as it successfully boosts the accuracy of graph attribute completion models by up to 30.68\% and uncovers important patterns in telecommunication alarm data.
LGJul 2, 2024Code
HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly DetectionYali Fu, Jindong Li, Jiahong Liu et al.
Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. Most existing methods that rely on traditional GNNs mainly consider pairwise relationships between first-order neighbors, which is insufficient to capture the complex high-order dependencies often associated with anomalies. This limitation underscores the necessity of exploring high-order node interactions in UGAD. In addition, most previous works ignore the underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors in the UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit high-order node group information, we construct hypergraphs based on pre-designed gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with the hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group information and hyperbolic geometry in this field. Extensive experiments on 13 real-world datasets of different fields demonstrate the superiority of HC-GLAD on the UGAD task. The code is available at https://github.com/Yali-F/HC-GLAD.
91.0AIMay 19Code
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward DecompositionYanyu Chen, Jiyue Jiang, Dianzhi Yu et al.
The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/Reinhardmannn/LC-ERD.
77.6AIMay 17Code
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system securityJinhu Qi, Muzhi Li, Jiahong Liu et al.
Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust-utility trade-off, and present a case study of real-world security failures in open-source agentic systems. Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.
LGJul 1, 2024
Hypformer: Exploring Efficient Transformer Fully in Hyperbolic SpaceMenglin Yang, Harshit Verma, Delvin Ce Zhang et al.
Hyperbolic geometry have shown significant potential in modeling complex structured data, particularly those with underlying tree-like and hierarchical structures. Despite the impressive performance of various hyperbolic neural networks across numerous domains, research on adapting the Transformer to hyperbolic space remains limited. Previous attempts have mainly focused on modifying self-attention modules in the Transformer. However, these efforts have fallen short of developing a complete hyperbolic Transformer. This stems primarily from: (i) the absence of well-defined modules in hyperbolic space, including linear transformation layers, LayerNorm layers, activation functions, dropout operations, etc. (ii) the quadratic time complexity of the existing hyperbolic self-attention module w.r.t the number of input tokens, which hinders its scalability. To address these challenges, we propose, Hypformer, a novel hyperbolic Transformer based on the Lorentz model of hyperbolic geometry. In Hypformer, we introduce two foundational blocks that define the essential modules of the Transformer in hyperbolic space. Furthermore, we develop a linear self-attention mechanism in hyperbolic space, enabling hyperbolic Transformer to process billion-scale graph data and long-sequence inputs for the first time. Our experimental results confirm the effectiveness and efficiency of Hypformer across various datasets, demonstrating its potential as an effective and scalable solution for large-scale data representation and large models.
LGOct 27, 2023
Understanding and Mitigating Hyperbolic Dimensional Collapse in Graph Contrastive LearningYifei Zhang, Hao Zhu, Menglin Yang et al.
Learning generalizable self-supervised graph representations for downstream tasks is challenging. To this end, Contrastive Learning (CL) has emerged as a leading approach. The embeddings of CL are arranged on a hypersphere where similarity is measured by the cosine distance. However, many real-world graphs, especially of hierarchical nature, cannot be embedded well in the Euclidean space. Although the hyperbolic embedding is suitable for hierarchical representation learning, naively applying CL to the hyperbolic space may result in the so-called dimension collapse, i.e., features will concentrate mostly within few density regions, leading to poor utilization of the whole feature space. Thus, we propose a novel contrastive learning framework to learn high-quality graph embeddings in hyperbolic space. Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information, as well as we propose a substitute of the uniformity metric to prevent the so-called dimensional collapse. We show that in the hyperbolic space one has to address the leaf- and height-level uniformity related to properties of trees. In the ambient space of the hyperbolic manifold these notions translate into imposing an isotropic ring density towards boundaries of Poincaré ball. Our experiments support the efficacy of our method.
CLFeb 17, 2025Code
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionAilin Huang, Boyong Wu, Bruce Wang et al.
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
AIFeb 17, 2025Code
A Survey of Personalized Large Language Models: Progress and Future DirectionsJiahong Liu, Zexuan Qiu, Zhongyang Li et al.
Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.
CLSep 2, 2025Code
Implicit Reasoning in Large Language Models: A Comprehensive SurveyJindong Li, Yali Fu, Li Fan et al.
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks. Reasoning with LLMs is central to solving multi-step problems and complex decision-making. To support efficient reasoning, recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning, where reasoning occurs silently via latent structures without emitting intermediate textual steps. Implicit reasoning brings advantages such as lower generation cost, faster inference, and better alignment with internal computation. Although prior surveys have discussed latent representations in the context of reasoning, a dedicated and mechanism-level examination of how reasoning unfolds internally within LLMs remains absent. This survey fills that gap by introducing a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies. We organize existing methods into three execution paradigms based on \textbf{\textit{how and where internal computation unfolds}}: latent optimization, signal-guided control, and layer-recurrent execution. We also review structural, behavioral and representation-based evidence that supports the presence of implicit reasoning in LLMs. We further provide a structured overview of the evaluation metrics and benchmarks used in existing works to assess the effectiveness and reliability of implicit reasoning. We maintain a continuously updated project at: https://github.com/digailab/awesome-llm-implicit-reasoning.
CLFeb 5
TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research AgentsYanyu Chen, Jiyue Jiang, Jiahong Liu et al.
The evaluation of Deep Research Agents is a critical challenge, as conventional outcome-based metrics fail to capture the nuances of their complex reasoning. Current evaluation faces two primary challenges: 1) a reliance on singular metrics like Pass@1, creating a "high-score illusion" that ignores the quality, efficiency, and soundness of the reasoning process; and 2) the failure of static benchmarks to quantify crucial attributes like robustness and latent capability. To address these gaps, we introduce TRACE (Trajectory-Aware Comprehensive Evaluation), a framework that holistically assesses the entire problem-solving trajectory. To counter the "high-score illusion", we propose a Hierarchical Trajectory Utility Function that quantifies process efficiency and cognitive quality, including evidence grounding, alongside accuracy. To measure deeper attributes, TRACE introduces a Scaffolded Capability Assessment protocol, quantifying an agent's latent ability by determining the minimum guidance needed for success. Our contributions include the TRACE framework, its novel metrics, and the accompanying DeepResearch-Bench with controllable complexity. Experiments show TRACE delivers a granular ranking that uncovers critical trade-offs between agent accuracy, efficiency, and robustness entirely missed by singular metrics.
IRJan 30
SemaCDR: LLM-Powered Transferable Semantics for Cross-Domain Sequential RecommendationChunxu Zhang, Shanqiang Huang, Zijian Zhang et al.
Cross-domain recommendation (CDR) addresses the data sparsity and cold-start problems in the target domain by leveraging knowledge from data-rich source domains. However, existing CDR methods often rely on domain-specific features or identifiers that lack transferability across different domains, limiting their ability to capture inter-domain semantic patterns. To overcome this, we propose SemaCDR, a semantics-driven framework for cross-domain sequential recommendation that leverages large language models (LLMs) to construct a unified semantic space. SemaCDR creates multiview item features by integrating LLM-generated domain-agnostic semantics with domain-specific content, aligned by contrastive regularization. SemaCDR systematically creates LLM-generated domain-specific and domain-agnostic semantics, and employs adaptive fusion to generate unified preference representations. Furthermore, it aligns cross-domain behavior sequences with an adaptive fusion mechanism to synthesize interaction sequences from source, target, and mixed domains. Extensive experiments on real-world datasets show that SemaCDR consistently outperforms state-of-the-art baselines, demonstrating its effectiveness in capturing coherent intra-domain patterns while facilitating knowledge transfer across domains.
LGFeb 2
Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuningWenhao Yu, Shaohang Wei, Jiahong Liu et al.
Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-dimensional: the ground-truth probability reflects downstream alignment, while token entropy reflects intrinsic uncertainty induced by the pre-training prior. Ignoring entropy can misidentify noisy or easily replaceable tokens as learning-critical, while ignoring probability fails to reflect target-specific alignment. RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution. The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens without over-penalizing intrinsically uncertain positions. Experiments on multiple backbones show consistent improvements on mathematical reasoning benchmarks, transfer gains on out-of-distribution reasoning, and pre code generation performance over probability-only or entropy-only reweighting baselines.
CVApr 19, 2025Code
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive SurveyJindong Li, Yongguang Li, Yali Fu et al.
As machine learning evolves, domain generalization (DG) and domain adaptation (DA) have become crucial for enhancing model robustness across diverse environments. Contrastive Language-Image Pretraining (CLIP) plays a significant role in these tasks, offering powerful zero-shot capabilities that allow models to perform effectively in unseen domains. However, there remains a significant gap in the literature, as no comprehensive survey currently exists that systematically explores the applications of CLIP in DG and DA, highlighting the necessity for this review. This survey presents a comprehensive review of CLIP's applications in DG and DA. In DG, we categorize methods into optimizing prompt learning for task alignment and leveraging CLIP as a backbone for effective feature extraction, both enhancing model adaptability. For DA, we examine both source-available methods utilizing labeled source data and source-free approaches primarily based on target domain data, emphasizing knowledge transfer mechanisms and strategies for improved performance across diverse contexts. Key challenges, including overfitting, domain diversity, and computational efficiency, are addressed, alongside future research opportunities to advance robustness and efficiency in practical applications. By synthesizing existing literature and pinpointing critical gaps, this survey provides valuable insights for researchers and practitioners, proposing directions for effectively leveraging CLIP to enhance methodologies in domain generalization and adaptation. Ultimately, this work aims to foster innovation and collaboration in the quest for more resilient machine learning models that can perform reliably across diverse real-world scenarios. A more up-to-date version of the papers is maintained at: https://github.com/jindongli-Ai/Survey_on_CLIP-Powered_Domain_Generalization_and_Adaptation.
CLJul 21, 2025Code
Discrete Tokenization for Multimodal LLMs: A Comprehensive SurveyJindong Li, Yali Fu, Jiahong Liu et al.
The rapid advancement of large language models (LLMs) has intensified the need for effective mechanisms to transform continuous multimodal data into discrete representations suitable for language-based processing. Discrete tokenization, with vector quantization (VQ) as a central approach, offers both computational efficiency and compatibility with LLM architectures. Despite its growing importance, there is a lack of a comprehensive survey that systematically examines VQ techniques in the context of LLM-based systems. This work fills this gap by presenting the first structured taxonomy and analysis of discrete tokenization methods designed for LLMs. We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines. Beyond algorithm-level investigation, we discuss existing research in terms of classical applications without LLMs, LLM-based single-modality systems, and LLM-based multimodal systems, highlighting how quantization strategies influence alignment, reasoning, and generation performance. In addition, we identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints. Finally, we discuss emerging research directions such as dynamic and task-adaptive quantization, unified tokenization frameworks, and biologically inspired codebook learning. This survey bridges the gap between traditional vector quantization and modern LLM applications, serving as a foundational reference for the development of efficient and generalizable multimodal systems. A continuously updated version is available at: https://github.com/jindongli-Ai/LLM-Discrete-Tokenization-Survey.
LGFeb 28, 2022Code
Hyperbolic Graph Neural Networks: A Review of Methods and ApplicationsMenglin Yang, Min Zhou, Tong Zhang et al.
Graph representation learning in Euclidean space, despite its widespread adoption and proven utility in many domains, often struggles to effectively capture the inherent hierarchical and complex relational structures prevalent in real-world data, particularly for datasets exhibiting a highly non-Euclidean latent anatomy or power-law distributions. Hyperbolic geometry, with its constant negative curvature and exponential growth property, naturally accommodates such structures, offering a promising alternative for learning rich graph representations. This survey paper provides a comprehensive review of the rapidly evolving field of Hyperbolic Graph Learning (HGL). We systematically categorize and analyze existing methods broadly dividing them into (1) hyperbolic graph embedding-based techniques, (2) graph neural network-based hyperbolic models, and (3) emerging paradigms. Beyond methodologies, we extensively discuss diverse applications of HGL across multiple domains, including recommender systems, knowledge graphs, bioinformatics, and other relevant scenarios, demonstrating the broad applicability and effectiveness of hyperbolic geometry in real-world graph learning tasks. Most importantly, we identify several key challenges that serve as directions for advancing HGL, including handling complex data structures, developing geometry-aware learning objectives, ensuring trustworthy and scalable implementations, and integrating with foundation models, e.g., large language models. We highlight promising research opportunities in this exciting interdisciplinary area. A comprehensive repository can be found at https://github.com/digailab/awesome-hyperbolic-graph-learning.
LGDec 31, 2024
Low-Rank Adaptation for Foundation Models: A Comprehensive ReviewMenglin Yang, Jialin Chen, Jinkai Tao et al.
The rapid advancement of foundation modelslarge-scale neural networks trained on diverse, extensive datasetshas revolutionized artificial intelligence, enabling unprecedented advancements across domains such as natural language processing, computer vision, and scientific discovery. However, the substantial parameter count of these models, often reaching billions or trillions, poses significant challenges in adapting them to specific downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges, offering a parameter-efficient mechanism to fine-tune foundation models with minimal computational overhead. This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models, including recent techniques foundations, emerging frontiers and applications of low-rank adaptation across multiple domains. Finally, this survey discusses key challenges and future research directions in theoretical understanding, scalability, and robustness. This survey serves as a valuable resource for researchers and practitioners working with efficient foundation model adaptation.
LGApr 11, 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean GeometriesNeil He, Jiahong Liu, Buze Zhang et al.
In the era of foundation models and Large Language Models (LLMs), Euclidean space has been the de facto geometric setting for machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. At a large scale, real-world data often exhibit inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling, in a variety of domains, such as languages, vision, and the natural sciences. It is challenging to effectively capture these structures within the constraints of Euclidean spaces. This position paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models. By adopting these geometries, foundation models could more efficiently leverage the aforementioned structures. Task-aware adaptability that dynamically reconfigures embeddings to match the geometry of downstream applications could further enhance efficiency and expressivity. Our position is supported by a series of theoretical and empirical investigations of prevalent foundation models.Finally, we outline a roadmap for integrating non-Euclidean geometries into foundation models, including strategies for building geometric foundation models via fine-tuning, training from scratch, and hybrid approaches.
CVJun 3, 2025
Towards Geometry Problem Solving in the Large Model Era: A SurveyYurui Zhao, Xiang Wang, Jiahong Liu et al.
Geometry problem solving (GPS) represents a critical frontier in artificial intelligence, with profound applications in education, computer-aided design, and computational graphics. Despite its significance, automating GPS remains challenging due to the dual demands of spatial understanding and rigorous logical reasoning. Recent advances in large models have enabled notable breakthroughs, particularly for SAT-level problems, yet the field remains fragmented across methodologies, benchmarks, and evaluation frameworks. This survey systematically synthesizes GPS advancements through three core dimensions: (1) benchmark construction, (2) textual and diagrammatic parsing, and (3) reasoning paradigms. We further propose a unified analytical paradigm, assess current limitations, and identify emerging opportunities to guide future research toward human-level geometric reasoning, including automated benchmark generation and interpretable neuro-symbolic integration.
LGMay 27, 2025
Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature AggregationMeng Qin, Jiahong Liu, Irwin King
Graph neural networks (GNNs), which capture graph structures via a feature aggregation mechanism following the graph embedding framework, have demonstrated a powerful ability to support various tasks. According to the topology properties (e.g., structural roles or community memberships of nodes) to be preserved, graph embedding can be categorized into identity and position embedding. However, it is unclear for most GNN-based methods which property they can capture. Some of them may also suffer from low efficiency and scalability caused by several time- and space-consuming procedures (e.g., feature extraction and training). From a perspective of graph signal processing, we find that high- and low-frequency information in the graph spectral domain may characterize node identities and positions, respectively. Based on this investigation, we propose random feature aggregation (RFA) for efficient identity and position embedding, serving as an extreme ablation study regarding GNN feature aggregation. RFA (i) adopts a spectral-based GNN without learnable parameters as its backbone, (ii) only uses random noises as inputs, and (iii) derives embeddings via just one feed-forward propagation (FFP). Inspired by degree-corrected spectral clustering, we further introduce a degree correction mechanism to the GNN backbone. Surprisingly, our experiments demonstrate that two variants of RFA with high- and low-pass filters can respectively derive informative identity and position embeddings via just one FFP (i.e., without any training). As a result, RFA can achieve a better trade-off between quality and efficiency for both identity and position embedding over various baselines.
LGJan 21, 2022
Enhancing Hyperbolic Graph Embeddings via Contrastive LearningJiahong Liu, Menglin Yang, Min Zhou et al.
Recently, hyperbolic space has risen as a promising alternative for semi-supervised graph representation learning. Many efforts have been made to design hyperbolic versions of neural network operations. However, the inspiring geometric properties of this unique geometry have not been fully explored yet. The potency of graph models powered by the hyperbolic space is still largely underestimated. Besides, the rich information carried by abundant unlabelled samples is also not well utilized. Inspired by the recently active and emerging self-supervised learning, in this study, we attempt to enhance the representation power of hyperbolic graph models by drawing upon the advantages of contrastive learning. More specifically, we put forward a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces to implicitly capture the hierarchical structure shared between different views. Then, we design a hyperbolic position consistency (HPC) constraint based on hyperbolic distance and the homophily assumption to make contrastive learning fit into hyperbolic space. Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL as it consistently outperforms competing methods by considerable margins for the node classification task.
RONov 10, 2020
Robotic Exploration of Unknown 2D Environment Using a Frontier-based Automatic-Differentiable Information Gain MeasureDi Deng, Runlin Duan, Jiahong Liu et al.
At the heart of path-planning methods for autonomous robotic exploration is a heuristic which encourages exploring unknown regions of the environment. Such heuristics are typically computed using frontier-based or information-theoretic methods. Frontier-based methods define the information gain of an exploration path as the number of boundary cells, or frontiers, which are visible from the path. However, the discrete and non-differentiable nature of this measure of information gain makes it difficult to optimize using gradient-based methods. In contrast, information-theoretic methods define information gain as the mutual information between the sensor's measurements and the explored map. However, computation of the gradient of mutual information involves finite differencing and is thus computationally expensive. This work proposes an exploration planning framework that combines sampling-based path planning and gradient-based path optimization. The main contribution of this framework is a novel reformulation of information gain as a differentiable function. This allows us to simultaneously optimize information gain with other differentiable quality measures, such as smoothness. The proposed planning framework's effectiveness is verified both in simulation and in hardware experiments using a Turtlebot3 Burger robot.
RONov 22, 2019
Constrained Heterogeneous Vehicle Path Planning for Large-area CoverageDi Deng, Wei Jing, Yuhe Fu et al.
There is a strong demand for covering a large area autonomously by multiple UAVs (Unmanned Aerial Vehicles) supported by a ground vehicle. Limited by UAVs' battery life and communication distance, complete coverage of large areas typically involves multiple take-offs and landings to recharge batteries, and the transportation of UAVs between operation areas by a ground vehicle. In this paper, we introduce a novel large-area-coverage planning framework which collectively optimizes the paths for aerial and ground vehicles. Our method first partitions a large area into sub-areas, each of which a given fleet of UAVs can cover without recharging batteries. UAV operation routes, or trails, are then generated for each sub-area. Next, the assignment of trials to different UAVs and the order in which UAVs visit their assigned trails are simultaneously optimized to minimize the total UAV flight distance. Finally, a ground vehicle transportation path which visits all sub-areas is found by solving an asymmetric traveling salesman problem (ATSP). Although finding the globally optimal trail assignment and transition paths can be formulated as a Mixed Integer Quadratic Program (MIQP), the MIQP is intractable even for small problems. We show that the solution time can be reduced to close-to-real-time levels by first finding a feasible solution using a Random Key Genetic Algorithm (RKGA), which is then locally optimized by solving a much smaller MIQP.