AIJul 10, 2023Code
Exploring Large Language Model for Graph Data Understanding in Online Job RecommendationsLikang Wu, Zhaopeng Qiu, Zhi Zheng et al.
Large Language Models (LLMs) have revolutionized natural language processing tasks, demonstrating their exceptional capabilities in various domains. However, their potential for behavior graph understanding in job recommendations remains largely unexplored. This paper focuses on unveiling the capability of large language models in understanding behavior graphs and leveraging this understanding to enhance recommendations in online recruitment, including the promotion of out-of-distribution (OOD) application. We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, we propose a meta-path prompt constructor that leverages LLM recommender to understand behavior graphs for the first time and design a corresponding path augmentation module to alleviate the prompt bias introduced by path-based sequence input. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users. We evaluate the effectiveness of our approach on a comprehensive dataset and demonstrate its ability to improve the relevance and quality of recommended quality. This research not only sheds light on the untapped potential of large language models but also provides valuable insights for developing advanced recommendation systems in the recruitment market. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences. We release the code at https://github.com/WLiK/GLRec.
IRNov 6, 2023Code
APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential RecommendationMingjia Yin, Hao Wang, Xiang Xu et al.
The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance. Therefore, previous works attempt to tackle this problem with a global collaborative item graph constructed by pre-defined rules. However, these methods neglect two crucial properties when capturing global collaborative information, i.e., adaptiveness and personalization, yielding sub-optimal user representations. To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems. Specifically, we first learn an adaptive global graph among all items and capture global collaborative information with it in a self-supervised fashion, whose computational burden can be further alleviated by the proposed SVD-based accelerator. Furthermore, based on the graph, we propose to extract and utilize personalized item correlations in the form of relative positional encoding, which is a highly compatible manner of personalizing the utilization of global collaborative information. Finally, the entire framework is optimized in a multi-task learning paradigm, thus each part of APGL4SR can be mutually reinforced. As a generic framework, APGL4SR can outperform other baselines with significant margins. The code is available at https://github.com/Graph-Team/APGL4SR.
47.2LGMay 27Code
Robust Contrastive Graph Clustering with Adaptive Local-Global IntegrationLei Zhang, Fubo Sun, Haipeng Yang et al.
Graph clustering is essential in graph analysis for revealing structural patterns and node communities. Despite recent advances in self-supervised contrastive learning that have improved clustering via structural and attribute signals, existing methods still struggle to flexibly capture high-order local structures and often overlook global semantics in complex graphs. These limitations lead to suboptimal node representations, especially in real-world graphs with fragmented structures and ambiguous cluster boundaries. To address these limitations, a contrastive graph clustering framework is proposed to jointly integrate multi-scale local structures with global semantics via attention mechanisms. At the local level, GNN-based topological signals extracted from multiple propagation depths are adaptively fused through attention-based weighting to capture multi-scale neighborhood features. At the global level, semantic prototypes derived from dynamically evolving cluster centers are adaptively aggregated through attention to guide node representations and enhance inter-cluster separability. The model is trained under a dual-view contrastive learning paradigm with a hybrid objective that combines instance-level and structure-aware losses to improve representation robustness and discrimination. Experiments on eight real-world graph datasets demonstrate that our method achieves competitive clustering performance. Code is available at https://github.com/vege12138/w2.
IRJul 5, 2023
Generative Job Recommendations with Large Language ModelZhi Zheng, Zhaopeng Qiu, Xiao Hu et al.
The rapid development of online recruitment services has encouraged the utilization of recommender systems to streamline the job seeking process. Predominantly, current job recommendations deploy either collaborative filtering or person-job matching strategies. However, these models tend to operate as "black-box" systems and lack the capacity to offer explainable guidance to job seekers. Moreover, conventional matching-based recommendation methods are limited to retrieving and ranking existing jobs in the database, restricting their potential as comprehensive career AI advisors. To this end, here we present GIRL (GeneratIve job Recommendation based on Large language models), a novel approach inspired by recent advancements in the field of Large Language Models (LLMs). We initially employ a Supervised Fine-Tuning (SFT) strategy to instruct the LLM-based generator in crafting suitable Job Descriptions (JDs) based on the Curriculum Vitae (CV) of a job seeker. Moreover, we propose to train a model which can evaluate the matching degree between CVs and JDs as a reward model, and we use Proximal Policy Optimization (PPO)-based Reinforcement Learning (RL) method to further fine-tine the generator. This aligns the generator with recruiter feedback, tailoring the output to better meet employer preferences. In particular, GIRL serves as a job seeker-centric generative model, providing job suggestions without the need of a candidate set. This capability also enhances the performance of existing job recommendation models by supplementing job seeking features with generated content. With extensive experiments on a large-scale real-world dataset, we demonstrate the substantial effectiveness of our approach. We believe that GIRL introduces a paradigm-shifting approach to job recommendation systems, fostering a more personalized and comprehensive job-seeking experience.
CVJun 14, 2023
Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph PropagationLikang Wu, Zhi Li, Hongke Zhao et al.
Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into visual information from existing knowledge graphs to promote the cognitive ability of the unseen data. Actually, real-world knowledge is naturally formed by multimodal facts. Compared with ordinary structural knowledge from a graph perspective, multimodal KG can provide cognitive systems with fine-grained knowledge. For example, the text description and visual content can depict more critical details of a fact than only depending on knowledge triplets. Unfortunately, this multimodal fine-grained knowledge is largely unexploited due to the bottleneck of feature alignment between different modalities. To that end, we propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings via a designed dense attention module and self-calibration loss. It makes the semantic transfer process of our ZSL framework learns more differentiated knowledge between entities. Our model also gets rid of the performance limitation of only using rough global features. We conduct extensive experiments and evaluate our model on large-scale real-world data. The experimental results clearly demonstrate the effectiveness of the proposed model in standard zero-shot classification tasks.
IRMar 1, 2023
GUESR: A Global Unsupervised Data-Enhancement with Bucket-Cluster Sampling for Sequential RecommendationYongqiang Han, Likang Wu, Hao Wang et al.
Sequential Recommendation is a widely studied paradigm for learning users' dynamic interests from historical interactions for predicting the next potential item. Although lots of research work has achieved remarkable progress, they are still plagued by the common issues: data sparsity of limited supervised signals and data noise of accidentally clicking. To this end, several works have attempted to address these issues, which ignored the complex association of items across several sequences. Along this line, with the aim of learning representative item embedding to alleviate this dilemma, we propose GUESR, from the view of graph contrastive learning. Specifically, we first construct the Global Item Relationship Graph (GIRG) from all interaction sequences and present the Bucket-Cluster Sampling (BCS) method to conduct the sub-graphs. Then, graph contrastive learning on this reduced graph is developed to enhance item representations with complex associations from the global view. We subsequently extend the CapsNet module with the elaborately introduced target-attention mechanism to derive users' dynamic preferences. Extensive experimental results have demonstrated our proposed GUESR could not only achieve significant improvements but also could be regarded as a general enhancement strategy to improve the performance in combination with other sequential recommendation methods.
LGAug 15, 2023
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node ClassificationLikang Wu, Junji Jiang, Hongke Zhao et al.
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features' prototypes and labels' semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It's necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph's instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.
SIApr 18, 2022
Preference Enhanced Social Influence Modeling for Network-Aware Cascade PredictionLikang Wu, Hao Wang, Enhong Chen et al.
Network-aware cascade size prediction aims to predict the final reposted number of user-generated information via modeling the propagation process in social networks. Estimating the user's reposting probability by social influence, namely state activation plays an important role in the information diffusion process. Therefore, Graph Neural Networks (GNN), which can simulate the information interaction between nodes, has been proved as an effective scheme to handle this prediction task. However, existing studies including GNN-based models usually neglect a vital factor of user's preference which influences the state activation deeply. To that end, we propose a novel framework to promote cascade size prediction by enhancing the user preference modeling according to three stages, i.e., preference topics generation, preference shift modeling, and social influence activation. Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate. Extensive experiments on two large-scale real-world datasets have clearly demonstrated the effectiveness of our proposed model compared to state-of-the-art baselines.
IRJul 3, 2024
LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason GenerationHongke Zhao, Songming Zheng, Likang Wu et al.
The explainability of recommendation systems is crucial for enhancing user trust and satisfaction. Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. However, in existing related studies, fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems, limiting the application potential of proven proprietary/closed-source LLM models, such as GPT-4. In this work, our proposed effective strategy LANE aligns LLMs with online recommendation systems without additional LLMs tuning, reducing costs and improving explainability. This innovative approach addresses key challenges in integrating language models with recommendation systems while fully utilizing the capabilities of powerful proprietary models. Specifically, our strategy operates through several key components: semantic embedding, user multi-preference extraction using zero-shot prompting, semantic alignment, and explainable recommendation generation using Chain of Thought (CoT) prompting. By embedding item titles instead of IDs and utilizing multi-head attention mechanisms, our approach aligns the semantic features of user preferences with those of candidate items, ensuring coherent and user-aligned recommendations. Sufficient experimental results including performance comparison, questionnaire voting, and visualization cases prove that our method can not only ensure recommendation performance, but also provide easy-to-understand and reasonable recommendation logic.
37.3IRMay 25
GCIB: Graph Contrastive Information Bottleneck for Multi-Behavior RecommendationLikang Wu, Zihao Chen, Jianxin Zhang et al.
With the rapid emergence of multi-behavior learning in recommender systems, leveraging auxiliary user behaviors has proven effective for mitigating target-behavior data sparsity. Yet auxiliary behavior graphs frequently contain noisy or irrelevant interactions that do not align with the target task, impeding the learning of accurate user and item embeddings. Moreover, the scarcity of direct supervision from the target behavior complicates the extraction of informative collaborative signals. In this paper, we introduce GCIB (Graph Contrastive Information Bottleneck), a novel framework that denoises auxiliary behavior information and enriches target behavior representations at both the structural and feature levels. At the structural level, GCIB employs a Graph Information Bottleneck (GIB) objective to maximize mutual information between the denoised auxiliary graph and the target-behavior graph while minimizing mutual information with the original auxiliary graph. This formulation preserves task-relevant structural patterns and suppresses spurious interactions. At the feature level, we propose a cross-behavior Graph Contrastive Learning (GCL) scheme in which denoised auxiliary features and target-behavior features serve as complementary views for both users and items. By contrasting these views, GCIB enriches sparse target-behavior representations with semantics distilled from auxiliary behaviors. Extensive experiments demonstrate that GCIB outperforms state-of-the-art baselines, highlighting its ability to learn noise-resilient and target-aware representations for multi-behavior recommendation.
94.9LGMay 12Code
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy CorrectionZhong Guan, Yongjian Guo, Haoran Sun et al.
Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be decomposed into two semantically distinct factors: a \emph{training--inference discrepancy term} that aligns inference-side and training-side distributions at the same behavior-policy version, and a \emph{policy-staleness term} that constrains the update from the historical policy to the current policy. We show that practical asynchronous pipelines with delayed updates and partial rollouts often lose the required historical training-side logits, or old logits. This missing-old-logit problem entangles discrepancy repair with staleness correction, breaks the intended semantics of decoupled correction, and makes clipping and masking thresholds interact undesirably. To address this issue, we study both exact and approximate correction routes. We propose three exact old-logit acquisition strategies: snapshot-based version tracking, a dedicated old-logit model, and synchronization via partial rollout interruption, and compare their system trade-offs. From the perspective of approximate correction, we focus on preserving the benefits of decoupled correction through a more appropriate approximate policy when exact old logits cannot be recovered at low cost, without incurring extra system overhead. Following this analysis, we adopt a revised PPO-EWMA method, which achieves significant gains in both training speed and optimization performance. Code at https://github.com/millioniron/ROLL.
AIFeb 5
RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full AsynchronismZhong Guan, Haoran Sun, Yongjian Guo et al.
In recent years, Vision-Language-Action (VLA) models have emerged as a crucial pathway towards general embodied intelligence, yet their training efficiency has become a key bottleneck. Although existing reinforcement learning (RL)-based training frameworks like RLinf can enhance model generalization, they still rely on synchronous execution, leading to severe resource underutilization and throughput limitations during environment interaction, policy generation (rollout), and model update phases (actor). To overcome this challenge, this paper, for the first time, proposes and implements a fully-asynchronous policy training framework encompassing the entire pipeline from environment interaction, rollout generation, to actor policy updates. Systematically drawing inspiration from asynchronous optimization ideas in large model RL, our framework designs a multi-level decoupled architecture. This includes asynchronous parallelization of environment interaction and trajectory collection, streaming execution for policy generation, and decoupled scheduling for training updates. We validated the effectiveness of our method across diverse VLA models and environments. On the LIBERO benchmark, the framework achieves throughput improvements of up to 59.25\% compared to existing synchronous strategies. When deeply optimizing separation strategies, throughput can be increased by as much as 126.67\%. We verified the effectiveness of each asynchronous component via ablation studies. Scaling law validation across 8 to 256 GPUs demonstrates our method's excellent scalability under most conditions.
AIMay 4, 2025Code
Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured DataZhong Guan, Likang Wu, Hongke Zhao et al.
Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on topological connections, they fall short compared to message-passing mechanisms on fixed links, such as those employed by Graph Neural Networks (GNNs). This raises a question: ``Does attention fail for graphs in natural language settings?'' Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. The goal is to gain deeper insights into the attention behavior of LLMs over graph structures. We uncovered unique phenomena regarding how LLMs apply attention to graph-structured data and analyzed these findings to improve the modeling of such data by LLMs. The primary findings of our research are: 1) While LLMs can recognize graph data and capture text-node interactions, they struggle to model inter-node relationships within graph structures due to inherent architectural constraints. 2) The attention distribution of LLMs across graph nodes does not align with ideal structural patterns, indicating a failure to adapt to graph topology nuances. 3) Neither fully connected attention nor fixed connectivity is optimal; each has specific limitations in its application scenarios. Instead, intermediate-state attention windows improve LLM training performance and seamlessly transition to fully connected windows during inference. Source code: \href{https://github.com/millioniron/LLM_exploration}{LLM4Exploration}
CLJun 19, 2024Code
Multi-View Empowered Structural Graph Wordification for Language ModelsZipeng Liu, Likang Wu, Ming He et al.
Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings into LLMs at the cost of losing explainable prompt semantics. To bridge this gap, we introduce an end-to-end modality-aligning framework for LLM-graph alignment: Dual-Residual Vector Quantized-Variational AutoEncoder, namely Dr.E. Our approach is purposefully designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. We also manage to enhance LLMs' more robust structural understanding of graphs by incorporating multiple views of the central nodes based on their surrounding nodes at various distances. Our experimental evaluations on standard graph tasks demonstrate competitive performance against other state-of-the-art (SOTA) approaches. Additionally, our framework ensures certain visual interpretability, efficiency, and robustness, marking the promising successful endeavor to achieve token-level alignment between LLMs and GNNs. Our code is available at: https://github.com/Timothy914/Dr.E.
IRMay 31, 2023Code
A Survey on Large Language Models for RecommendationLikang Wu, Zhi Zheng, Zhaopeng Qiu et al.
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) and have recently gained significant attention in the domain of Recommendation Systems (RS). These models, trained on massive amounts of data using self-supervised learning, have demonstrated remarkable success in learning universal representations and have the potential to enhance various aspects of recommendation systems by some effective transfer techniques such as fine-tuning and prompt tuning, and so on. The crucial aspect of harnessing the power of language models in enhancing recommendation quality is the utilization of their high-quality representations of textual features and their extensive coverage of external knowledge to establish correlations between items and users. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec), with the latter being systematically sorted out for the first time. Furthermore, we systematically review and analyze existing LLM-based recommendation systems within each paradigm, providing insights into their methodologies, techniques, and performance. Additionally, we identify key challenges and several valuable findings to provide researchers and practitioners with inspiration. We have also created a GitHub repository to index relevant papers on LLMs for recommendation, https://github.com/WLiK/LLM4Rec.
AIOct 27, 2019Code
Long-term Joint Scheduling for Urban TrafficXianfeng Liang, Likang Wu, Joya Chen et al.
Recently, the traffic congestion in modern cities has become a growing worry for the residents. As presented in Baidu traffic report, the commuting stress index has reached surprising 1.973 in Beijing during rush hours, which results in longer trip time and increased vehicular queueing. Previous works have demonstrated that by reasonable scheduling, e.g, rebalancing bike-sharing systems and optimized bus transportation, the traffic efficiency could be significantly improved with little resource consumption. However, there are still two disadvantages that restrict their performance: (1) they only consider single scheduling in a short time, but ignoring the layout after first reposition, and (2) they only focus on the single transport. However, the multi-modal characteristics of urban public transportation are largely under-exploited. In this paper, we propose an efficient and economical multi-modal traffic scheduling scheme named JLRLS based on spatio -temporal prediction, which adopts reinforcement learning to obtain optimal long-term and joint schedule. In JLRLS, we combines multiple transportation to conduct scheduling by their own characteristics, which potentially helps the system to reach the optimal performance. Our implementation of an example by PaddlePaddle is available at https://github.com/bigdata-ustc/Long-term-Joint-Scheduling, with an explaining video at https://youtu.be/t5M2wVPhTyk.
LGAug 15, 2024
An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential RecommendationJun Wang, Likang Wu, Qi Liu et al.
Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. However, previous studies mainly focus on discrete action and policy spaces, which might have difficulties in handling dramatically growing items efficiently. To mitigate this issue, in this paper, we aim to design an algorithmic framework applicable to continuous policies. To facilitate the control in the low-dimensional but dense user preference space, we propose an \underline{\textbf{E}}fficient \underline{\textbf{Co}}ntinuous \underline{\textbf{C}}ontrol framework (ECoC). Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces. Then, we develop the corresponding policy evaluation and policy improvement procedures. During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions. Moreover, beneficial from unified actions, the conservatism regularization for policies and value functions are combined and perfectly compatible with the continuous framework. The resulting dual regularization ensures the successful offline training of RL-based recommendation policies. Finally, we conduct extensive experiments to validate the effectiveness of our framework. The results show that compared to the discrete baselines, our ECoC is trained far more efficiently. Meanwhile, the final policies outperform baselines in both capturing the offline data and gaining long-term rewards.
LGJan 31, 2024
A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint PredictionWenshuo Chao, Zhaopeng Qiu, Likang Wu et al.
The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.
CLMar 2, 2025
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta InstructionLiping Liu, Chunhong Zhang, Likang Wu et al.
Self-reflection for Large Language Models (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs' internal reflection ability or external feedback. However, recent research has raised doubts about whether intrinsic self-correction without external feedback may even degrade performance. Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce Instruct-of-Reflection (IoRT), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. Specifically, we propose the instructor driven by the meta-thoughts and self-consistency classifier, generates various instructions, including refresh, stop, and select, to guide the next reflection iteration. Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks, highlighting its efficacy and applicability.
LGJul 9, 2025
Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow MatchingHuibo Xu, Runlong Yu, Likang Wu et al.
Existing generative models for time series forecasting often transform simple priors (typically Gaussian) into complex data distributions. However, their sampling initialization, independent of historical data, hinders the capture of temporal dependencies, limiting predictive accuracy. They also treat residuals merely as optimization targets, ignoring that residuals often exhibit meaningful patterns like systematic biases or nontrivial distributional structures. To address these, we propose Conditional Guided Flow Matching (CGFM), a novel model-agnostic framework that extends flow matching by integrating outputs from an auxiliary predictive model. This enables learning from the probabilistic structure of prediction residuals, leveraging the auxiliary model's prediction distribution as a source to reduce learning difficulty and refine forecasts. CGFM incorporates historical data as both conditions and guidance, uses two-sided conditional paths (with source and target conditioned on the same history), and employs affine paths to expand the path space, avoiding path crossing without complex mechanisms, preserving temporal consistency, and strengthening distribution alignment. Experiments across datasets and baselines show CGFM consistently outperforms state-of-the-art models, advancing forecasting.
AIOct 9, 2025
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task NavigationHaitao Jia, Ming He, Zimo Yin et al.
Mobile GUI agents exhibit substantial potential to facilitate and automate the execution of user tasks on mobile phones. However, exist mobile GUI agents predominantly privilege autonomous operation and neglect the necessity of active user engagement during task execution. This omission undermines their adaptability to information dilemmas including ambiguous, dynamically evolving, and conflicting task scenarios, leading to execution outcomes that deviate from genuine user requirements and preferences. To address these shortcomings, we propose ReInAgent, a context-aware multi-agent framework that leverages dynamic information management to enable human-in-the-loop mobile task navigation. ReInAgent integrates three specialized agents around a shared memory module: an information-managing agent for slot-based information management and proactive interaction with the user, a decision-making agent for conflict-aware planning, and a reflecting agent for task reflection and information consistency validation. Through continuous contextual information analysis and sustained user-agent collaboration, ReInAgent overcomes the limitation of existing approaches that rely on clear and static task assumptions. Consequently, it enables more adaptive and reliable mobile task navigation in complex, real-world scenarios. Experimental results demonstrate that ReInAgent effectively resolves information dilemmas and produces outcomes that are more closely aligned with genuine user preferences. Notably, on complex tasks involving information dilemmas, ReInAgent achieves a 25% higher success rate than Mobile-Agent-v2.
LGAug 21, 2025
Recall-Extend Dynamics: Enhancing Small Language Models through Controlled Exploration and Refined Offline IntegrationZhong Guan, Likang Wu, Hongke Zhao et al.
Many existing studies have achieved significant improvements in the reasoning capabilities of large language models (LLMs) through reinforcement learning with verifiable rewards (RLVR), while the enhancement of reasoning abilities in small language models (SLMs) has not yet been sufficiently explored. Combining distilled data from larger models with RLVR on small models themselves is a natural approach, but it still faces various challenges and issues. Therefore, we propose \textit{\underline{R}}ecall-\textit{\underline{E}}xtend \textit{\underline{D}}ynamics(RED): Enhancing Small Language Models through Controlled Exploration and Refined Offline Integration. In this paper, we explore the perspective of varying exploration spaces, balancing offline distillation with online reinforcement learning. Simultaneously, we specifically design and optimize for the insertion problem within offline data. By monitoring the ratio of entropy changes in the model concerning offline and online data, we regulate the weight of offline-SFT, thereby addressing the issues of insufficient exploration space in small models and the redundancy and complexity during the distillation process. Furthermore, to tackle the distribution discrepancies between offline data and the current policy, we design a sample-accuracy-based policy shift mechanism that dynamically chooses between imitating offline distilled data and learning from its own policy.
LGJul 14, 2025
NeuTSFlow: Modeling Continuous Functions Behind Time Series ForecastingHuibo Xu, Likang Wu, Xianquan Wang et al.
Time series forecasting is a fundamental task with broad applications, yet conventional methods often treat data as discrete sequences, overlooking their origin as noisy samples of continuous processes. Crucially, discrete noisy observations cannot uniquely determine a continuous function; instead, they correspond to a family of plausible functions. Mathematically, time series can be viewed as noisy observations of a continuous function family governed by a shared probability measure. Thus, the forecasting task can be framed as learning the transition from the historical function family to the future function family. This reframing introduces two key challenges: (1) How can we leverage discrete historical and future observations to learn the relationships between their underlying continuous functions? (2) How can we model the transition path in function space from the historical function family to the future function family? To address these challenges, we propose NeuTSFlow, a novel framework that leverages Neural Operators to facilitate flow matching for learning path of measure between historical and future function families. By parameterizing the velocity field of the flow in infinite-dimensional function spaces, NeuTSFlow moves beyond traditional methods that focus on dependencies at discrete points, directly modeling function-level features instead. Experiments on diverse forecasting tasks demonstrate NeuTSFlow's superior accuracy and robustness, validating the effectiveness of the function-family perspective.
AIJun 19, 2024
LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological ModelingZhong Guan, Hongke Zhao, Likang Wu et al.
Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand and process graph-structured data, fine-tuned LLMs perform even worse than some traditional GNN models on graph tasks, lacking inherent modeling capabilities for graph structures. Existing research overly emphasizes LLMs' understanding of semantic information captured by external models, while inadequately exploring graph topological structure modeling, thereby overlooking the genuine capabilities that LLMs lack. Consequently, in this paper, we introduce a new framework, LangTopo, which aligns graph structure modeling with natural language understanding at the token level. LangTopo quantifies the graph structure modeling capabilities of GNNs and LLMs by constructing a codebook for the graph modality and performs consistency maximization. This process aligns the text description of LLM with the topological modeling of GNN, allowing LLM to learn the ability of GNN to capture graph structures, enabling LLM to handle graph-structured data independently. We demonstrate the effectiveness of our proposed method on multiple datasets.
IRJun 19, 2024
Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware LearningZhong Guan, Likang Wu, Hongke Zhao et al.
Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the adequate capturing ability of collaborative information, existing modeling paradigms struggle to capture behavior patterns within community groups, leading to LLMs' ineffectiveness in discerning implicit interaction semantic in recommendation scenarios. To address this, we consider enhancing the learning capability of language model-driven recommendation models for structured data, specifically by utilizing interaction graphs rich in collaborative semantics. We propose a Graph-Aware Learning for Language Model-Driven Recommendations (GAL-Rec). GAL-Rec enhances the understanding of user-item collaborative semantics by imitating the intent of Graph Neural Networks (GNNs) to aggregate multi-hop information, thereby fully exploiting the substantial learning capacity of LLMs to independently address the complex graphs in the recommendation system. Sufficient experimental results on three real-world datasets demonstrate that GAL-Rec significantly enhances the comprehension of collaborative semantics, and improves recommendation performance.
CVJun 12, 2024
From a Social Cognitive Perspective: Context-aware Visual Social Relationship RecognitionShiwei Wu, Chao Zhang, Joya Chen et al.
People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods of social relationship understanding rely on the basic classification paradigm of detected persons and objects, which fails to understand the comprehensive context and often overlooks decisive social factors, especially subtle visual cues. To highlight the social-aware context and intricate details, we propose a novel approach that recognizes \textbf{Con}textual \textbf{So}cial \textbf{R}elationships (\textbf{ConSoR}) from a social cognitive perspective. Specifically, to incorporate social-aware semantics, we build a lightweight adapter upon the frozen CLIP to learn social concepts via our novel multi-modal side adapter tuning mechanism. Further, we construct social-aware descriptive language prompts (e.g., scene, activity, objects, emotions) with social relationships for each image, and then compel ConSoR to concentrate more intensively on the decisive visual social factors via visual-linguistic contrasting. Impressively, ConSoR outperforms previous methods with a 12.2\% gain on the People-in-Social-Context (PISC) dataset and a 9.8\% increase on the People-in-Photo-Album (PIPA) benchmark. Furthermore, we observe that ConSoR excels at finding critical visual evidence to reveal social relationships.
LGMay 27, 2021
Estimating Fund-Raising Performance for Start-up Projects from a Market Graph PerspectiveLikang Wu, Zhi Li, Hongke Zhao et al.
In the online innovation market, the fund-raising performance of the start-up project is a concerning issue for creators, investors and platforms. Unfortunately, existing studies always focus on modeling the fund-raising process after the publishment of a project but the predicting of a project attraction in the market before setting up is largely unexploited. Usually, this prediction is always with great challenges to making a comprehensive understanding of both the start-up project and market environment. To that end, in this paper, we present a focused study on this important problem from a market graph perspective. Specifically, we propose a Graph-based Market Environment (GME) model for predicting the fund-raising performance of the unpublished project by exploiting the market environment. In addition, we discriminatively model the project competitiveness and market preferences by designing two graph-based neural network architectures and incorporating them into a joint optimization stage. Furthermore, to explore the information propagation problem with dynamic environment in a large-scale market graph, we extend the GME model with parallelizing competitiveness quantification and hierarchical propagation algorithm. Finally, we conduct extensive experiments on real-world data. The experimental results clearly demonstrate the effectiveness of our proposed model.
AIJan 16, 2021
Learning the Implicit Semantic Representation on Graph-Structured DataLikang Wu, Zhi Li, Hongke Zhao et al.
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole, while the implicit semantic associations behind highly complex interactions of graphs are largely unexploited. In this paper, we propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs. In previous work, there are explorations of graph semantics via meta-paths. However, these methods mainly rely on explicit heterogeneous information that is hard to be obtained in a large amount of graph-structured data. SGCN first breaks through this restriction via leveraging the semantic-paths dynamically and automatically during the node aggregating process. To evaluate our idea, we conduct sufficient experiments on several standard datasets, and the empirical results show the superior performance of our model.
IRJun 8, 2020
Learning the Compositional Visual Coherence for Complementary RecommendationsZhi Li, Bo Wu, Qi Liu et al.
Complementary recommendations, which aim at providing users product suggestions that are supplementary and compatible with their obtained items, have become a hot topic in both academia and industry in recent years. %However, it is challenging due to its complexity and subjectivity. Existing work mainly focused on modeling the co-purchased relations between two items, but the compositional associations of item collections are largely unexplored. Actually, when a user chooses the complementary items for the purchased products, it is intuitive that she will consider the visual semantic coherence (such as color collocations, texture compatibilities) in addition to global impressions. Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents. Specifically, we first propose a \textit{Global Coherence Learning} (GCL) module based on multi-heads attention to model the global compositional coherence. Then, we generate the semantic-focal representations from different semantic regions and design a \textit{Focal Coherence Learning} (FCL) module to learn the focal compositional coherence from different semantic-focal representations. Finally, we optimize the CANN in a novel compositional optimization strategy. Extensive experiments on the large-scale real-world data clearly demonstrate the effectiveness of CANN compared with several state-of-the-art methods.
LGDec 14, 2019
Estimating Early Fundraising Performance of Innovations via Graph-based Market Environment ModelLikang Wu, Zhi Li, Hongke Zhao et al.
Well begun is half done. In the crowdfunding market, the early fundraising performance of the project is a concerned issue for both creators and platforms. However, estimating the early fundraising performance before the project published is very challenging and still under-explored. To that end, in this paper, we present a focused study on this important problem in a market modeling view. Specifically, we propose a Graph-based Market Environment model (GME) for estimating the early fundraising performance of the target project by exploiting the market environment. In addition, we discriminatively model the market competition and market evolution by designing two graph-based neural network architectures and incorporating them into the joint optimization stage. Finally, we conduct extensive experiments on the real-world crowdfunding data collected from Indiegogo.com. The experimental results clearly demonstrate the effectiveness of our proposed model for modeling and estimating the early fundraising performance of the target project.