AIJun 12, 2022Code
Human Mobility Prediction with Causal and Spatial-constrained Multi-task NetworkZongyuan Huang, Shengyuan Xu, Menghan Wang et al.
Modeling human mobility helps to understand how people are accessing resources and physically contacting with each other in cities, and thus contributes to various applications such as urban planning, epidemic control, and location-based advertisement. Next location prediction is one decisive task in individual human mobility modeling and is usually viewed as sequence modeling, solved with Markov or RNN-based methods. However, the existing models paid little attention to the logic of individual travel decisions and the reproducibility of the collective behavior of population. To this end, we propose a Causal and Spatial-constrained Long and Short-term Learner (CSLSL) for next location prediction. CSLSL utilizes a causal structure based on multi-task learning to explicitly model the "\textit{when$\rightarrow$what$\rightarrow$where}", a.k.a. "\textit{time$\rightarrow$activity$\rightarrow$location}" decision logic. We next propose a spatial-constrained loss function as an auxiliary task, to ensure the consistency between the predicted and actual spatial distribution of travelers' destinations. Moreover, CSLSL adopts modules named Long and Short-term Capturer (LSC) to learn the transition regularities across different time spans. Extensive experiments on three real-world datasets show promising performance improvements of CSLSL over baselines and confirm the effectiveness of introducing the causality and consistency constraints. The implementation is available at https://github.com/urbanmobility/CSLSL.
AIMar 2, 2022
Machine Learning Methods in Solving the Boolean Satisfiability ProblemWenxuan Guo, Junchi Yan, Hui-Ling Zhen et al.
This paper reviews the recent literature on solving the Boolean satisfiability problem (SAT), an archetypal NP-complete problem, with the help of machine learning techniques. Despite the great success of modern SAT solvers to solve large industrial instances, the design of handcrafted heuristics is time-consuming and empirical. Under the circumstances, the flexible and expressive machine learning methods provide a proper alternative to solve this long-standing problem. We examine the evolving ML-SAT solvers from naive classifiers with handcrafted features to the emerging end-to-end SAT solvers such as NeuroSAT, as well as recent progress on combinations of existing CDCL and local search solvers with machine learning methods. Overall, solving SAT with machine learning is a promising yet challenging research topic. We conclude the limitations of current works and suggest possible future directions.
CLOct 11, 2023
Accurate Use of Label Dependency in Multi-Label Text Classification Through the Lens of CausalityCaoyun Fan, Wenqing Chen, Jidong Tian et al.
Multi-Label Text Classification (MLTC) aims to assign the most relevant labels to each given text. Existing methods demonstrate that label dependency can help to improve the model's performance. However, the introduction of label dependency may cause the model to suffer from unwanted prediction bias. In this study, we attribute the bias to the model's misuse of label dependency, i.e., the model tends to utilize the correlation shortcut in label dependency rather than fusing text information and label dependency for prediction. Motivated by causal inference, we propose a CounterFactual Text Classifier (CFTC) to eliminate the correlation bias, and make causality-based predictions. Specifically, our CFTC first adopts the predict-then-modify backbone to extract precise label information embedded in label dependency, then blocks the correlation shortcut through the counterfactual de-bias technique with the help of the human causal graph. Experimental results on three datasets demonstrate that our CFTC significantly outperforms the baselines and effectively eliminates the correlation bias in datasets.
CLFeb 18, 2023
Improving the Out-Of-Distribution Generalization Capability of Language Models: Counterfactually-Augmented Data is not EnoughCaoyun Fan, Wenqing Chen, Jidong Tian et al.
Counterfactually-Augmented Data (CAD) has the potential to improve language models' Out-Of-Distribution (OOD) generalization capability, as CAD induces language models to exploit causal features and exclude spurious correlations. However, the empirical results of OOD generalization on CAD are not as efficient as expected. In this paper, we attribute the inefficiency to Myopia Phenomenon caused by CAD: language models only focus on causal features that are edited in the augmentation and exclude other non-edited causal features. As a result, the potential of CAD is not fully exploited. Based on the structural properties of CAD, we design two additional constraints to help language models extract more complete causal features contained in CAD, thus improving the OOD generalization capability. We evaluate our method on two tasks: Sentiment Analysis and Natural Language Inference, and the experimental results demonstrate that our method could unlock CAD's potential and improve language models' OOD generalization capability.
CVFeb 18, 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task LearningCaoyun Fan, Wenqing Chen, Jidong Tian et al.
When modeling related tasks in computer vision, Multi-Task Learning (MTL) can outperform Single-Task Learning (STL) due to its ability to capture intrinsic relatedness among tasks. However, MTL may encounter the insufficient training problem, i.e., some tasks in MTL may encounter non-optimal situation compared with STL. A series of studies point out that too much gradient noise would lead to performance degradation in STL, however, in the MTL scenario, Inter-Task Gradient Noise (ITGN) is an additional source of gradient noise for each task, which can also affect the optimization process. In this paper, we point out ITGN as a key factor leading to the insufficient training problem. We define the Gradient-to-Noise Ratio (GNR) to measure the relative magnitude of gradient noise and design the MaxGNR algorithm to alleviate the ITGN interference of each task by maximizing the GNR of each task. We carefully evaluate our MaxGNR algorithm on two standard image MTL datasets: NYUv2 and Cityscapes. The results show that our algorithm outperforms the baselines under identical experimental conditions.
AIJan 13, 2023
Contrast with Major Classifier Vectors for Federated Medical Relation Extraction with Heterogeneous Label DistributionChunhui Du, Hao He, Yaohui Jin
Federated medical relation extraction enables multiple clients to train a deep network collaboratively without sharing their raw medical data. In order to handle the heterogeneous label distribution across clients, most of the existing works only involve enforcing regularization between local and global models during optimization. In this paper, we fully utilize the models of all clients and propose a novel concept of \textit{major classifier vectors}, where a group of class vectors is obtained in an ensemble rather than the weighted average method on the server. The major classifier vectors are then distributed to all clients and the local training of each client is Contrasted with Major Classifier vectors (FedCMC), so the local model is not prone to overfitting to the local label distribution. FedCMC requires only a small amount of additional transfer of classifier parameters without any leakage of raw data, extracted representations, and label distributions. Our extensive experiments show that FedCMC outperforms the other state-of-the-art FL algorithms on three medical relation extraction datasets.
88.0ROMar 15
R3DP: Real-Time 3D-Aware Policy for Embodied ManipulationYuhao Zhang, Wanxi Dong, Yue Shi et al.
Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D priors into the policy without compromising real-time performance. The system maintains real-time efficiency by querying the pre-trained slow system (VGGT) only on sparse key frames, while simultaneously employing a lightweight Temporal Feature Prediction Network (TFPNet) to predict features for all intermediate frames. By leveraging historical data to exploit temporal correlations, TFPNet explicitly improves task success rates through consistent feature estimation. Additionally, to enable more effective multi-view fusion, we introduce a Multi-View Feature Fuser (MVFF) that aggregates features across views by explicitly incorporating camera intrinsics and extrinsics. R3DP offers a plug-and-play solution for integrating large models into real-time inference systems. We evaluate R3DP against multiple baselines across different visual configurations. R3DP effectively harnesses large-scale 3D priors to achieve superior results, outperforming single-view and multi-view DP by 32.9% and 51.4% in average success rate, respectively. Furthermore, by decoupling heavy 3D reasoning from policy execution, R3DP achieves a 44.8% reduction in inference time compared to a naive DP+VGGT integration.
AIJul 21, 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition RecommendationYu Zhang, Ruijie Yu, Kaipeng Zeng et al.
Identifying reaction conditions that are broadly applicable across diverse substrates is a longstanding challenge in chemical and pharmaceutical research. While many methods are available to generate conditions with acceptable performance, a universal approach for reliably discovering effective conditions during reaction exploration is rare. Consequently, current reaction optimization processes are often labor-intensive, time-consuming, and costly, relying heavily on trial-and-error experimentation. Nowadays, large language models (LLMs) are capable of tackling chemistry-related problems, such as molecule design and chemical reasoning tasks. Here, we report the design, implementation and application of Chemma-RC, a text-augmented multimodal LLM to identify effective conditions through task-specific dialogue and condition generation. Chemma-RC learns a unified representation of chemical reactions by aligning multiple modalities-including text corpus, reaction SMILES, and reaction graphs-within a shared embedding module. Performance benchmarking on datasets showed high precision in identifying optimal conditions, with up to 17% improvement over the current state-of-the-art methods. A palladium-catalysed imidazole C-H arylation reaction was investigated experimentally to evaluate the functionalities of the Chemma-RC in practice. Our findings suggest that Chemma-RC holds significant potential to accelerate high-throughput condition screening in chemical synthesis.
CLOct 18, 2023
Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language UnderstandingCaoyun Fan, Jidong Tian, Yitian Li et al.
Chain-of-Thought (CoT) is a technique that guides Large Language Models (LLMs) to decompose complex tasks into multi-step reasoning through intermediate steps in natural language form. Briefly, CoT enables LLMs to think step by step. However, although many Natural Language Understanding (NLU) tasks also require thinking step by step, LLMs perform less well than small-scale Masked Language Models (MLMs). To migrate CoT from LLMs to MLMs, we propose Chain-of-Thought Tuning (CoTT), a two-step reasoning framework based on prompt tuning, to implement step-by-step thinking for MLMs on NLU tasks. From the perspective of CoT, CoTT's two-step framework enables MLMs to implement task decomposition; CoTT's prompt tuning allows intermediate steps to be used in natural language form. Thereby, the success of CoT can be extended to NLU tasks through MLMs. To verify the effectiveness of CoTT, we conduct experiments on two NLU tasks: hierarchical classification and relation extraction, and the results show that CoTT outperforms baselines and achieves state-of-the-art performance.
CLOct 10, 2023
Unlock the Potential of Counterfactually-Augmented Data in Out-Of-Distribution GeneralizationCaoyun Fan, Wenqing Chen, Jidong Tian et al.
Counterfactually-Augmented Data (CAD) -- minimal editing of sentences to flip the corresponding labels -- has the potential to improve the Out-Of-Distribution (OOD) generalization capability of language models, as CAD induces language models to exploit domain-independent causal features and exclude spurious correlations. However, the empirical results of CAD's OOD generalization are not as efficient as anticipated. In this study, we attribute the inefficiency to the myopia phenomenon caused by CAD: language models only focus on causal features that are edited in the augmentation operation and exclude other non-edited causal features. Therefore, the potential of CAD is not fully exploited. To address this issue, we analyze the myopia phenomenon in feature space from the perspective of Fisher's Linear Discriminant, then we introduce two additional constraints based on CAD's structural properties (dataset-level and sentence-level) to help language models extract more complete causal features in CAD, thereby mitigating the myopia phenomenon and improving OOD generalization capability. We evaluate our method on two tasks: Sentiment Analysis and Natural Language Inference, and the experimental results demonstrate that our method could unlock the potential of CAD and improve the OOD generalization performance of language models by 1.0% to 5.9%.
LGNov 26, 2024Code
Learning Chemical Reaction Representation with Reactant-Product AlignmentKaipeng Zeng, Xianbin Liu, Yu Zhang et al.
Organic synthesis stands as a cornerstone of the chemical industry. The development of robust machine learning models to support tasks associated with organic reactions is of significant interest. However, current methods rely on hand-crafted features or direct adaptations of model architectures from other domains, which lack feasibility as data scales increase or ignore the rich chemical information inherent in reactions. To address these issues, this paper introduces RAlign, a novel chemical reaction representation learning model for various organic reaction-related tasks. By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing comprehension of the reaction mechanism. We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle various reaction conditions and to adapt to various datasets and downstream tasks. Additionally, we introduce a reaction-center-aware attention mechanism that enables the model to concentrate on key functional groups, thereby generating potent representations for chemical reactions. Our model has been evaluated on a range of downstream tasks. Experimental results indicate that our model markedly outperforms existing chemical reaction representation learning architectures on most of the datasets. We plan to open-source the code contingent upon the acceptance of the paper.
AIJun 30, 2025Code
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated DataYu Zhang, Ruijie Yu, Jidong Tian et al.
With the increasing interest in robotic synthesis in the context of organic chemistry, the automated extraction of chemical procedures from literature is critical. However, this task remains challenging due to the inherent ambiguity of chemical language and the high cost of human annotation required for developing reliable computer-aided extraction protocols. Here, we present ChemActor, a fully fine-tuned large language model (LLM), as a chemical executor to convert between unstructured experimental procedures and structured action sequences. We propose a sequential LLM-generated data framework to address the challenges of insufficient and low-quality annotated data. This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input. Additionally, we introduce a novel multi-round LLMs circle review metric, which reflects the model's advanced understanding of chemical experimental procedures. Extensive experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor, augmented by LLM-generated data, achieves state-of-the-art performance, outperforming the baseline model by 10%. The code is available at: https://github.com/Zhanghahah/ChemActor.
CLJan 24, 2024Code
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsChang Ma, Junlei Zhang, Zhihao Zhu et al.
Evaluating Large Language Models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assessment of agents for multi-faceted analysis. This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront. Ultimately, AgentBoard serves as a step towards demystifying agent behaviors and accelerating the development of stronger LLM agents.
AIDec 9, 2023
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic AnalysisCaoyun Fan, Jindou Chen, Yaohui Jin et al.
Game theory, as an analytical tool, is frequently utilized to analyze human behavior in social science research. With the high alignment between the behavior of Large Language Models (LLMs) and humans, a promising research direction is to employ LLMs as substitutes for humans in game experiments, enabling social science research. However, despite numerous empirical researches on the combination of LLMs and game theory, the capability boundaries of LLMs in game theory remain unclear. In this research, we endeavor to systematically analyze LLMs in the context of game theory. Specifically, rationality, as the fundamental principle of game theory, serves as the metric for evaluating players' behavior -- building a clear desire, refining belief about uncertainty, and taking optimal actions. Accordingly, we select three classical games (dictator game, Rock-Paper-Scissors, and ring-network game) to analyze to what extent LLMs can achieve rationality in these three aspects. The experimental results indicate that even the current state-of-the-art LLM (GPT-4) exhibits substantial disparities compared to humans in game theory. For instance, LLMs struggle to build desires based on uncommon preferences, fail to refine belief from many simple patterns, and may overlook or modify refined belief when taking actions. Therefore, we consider that introducing LLMs into game experiments in the field of social science should be approached with greater caution.
CHEM-PHMar 25, 2024
UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES AlignmentKaipeng Zeng, Bo yang, Xin Zhao et al.
Motivation: Retrosynthesis planning poses a formidable challenge in the organic chemical industry. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency. Results: This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods. Scientific contribution: We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5\% (top-5) and 5.4\% (top-10) increased accuracy over the strongest baseline.
39.3AIApr 26
Transferable Human Mobility Network Reconstruction with neuroGravityJinming Yang, Shaoyu Huang, Zongyuan Huang et al.
Accurate modeling of human mobility is critical for tackling urban planning and public health challenges. In undeveloped regions, the absence of comprehensive travel surveys necessitates reconstructing mobility networks from publicly available data. Here we develop neuroGravity, a physics-informed deep learning model that reliably reconstructs mobility flows from limited observations and transfers to unobserved cities. Using only urban facility and population distributions, we find that neuroGravity's regional representations strongly correlate with socioeconomic and livability status, offering scalable proxies for costly surveys. Furthermore, we uncover that spatial income segregation plays a key role in model transferability: mobility networks are most reliably reconstructed when target cities share similar segregation levels with the source. We design an index to quantify this segregation and accurately predict transferability. Finally, we generate mobility flow proxies for over 1,200 cities worldwide, highlighting neuroGravity's potential to mitigate critical data shortages in resource-limited, underdeveloped areas.
CLDec 12, 2023
Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration SelectionCaoyun Fan, Jidong Tian, Yitian Li et al.
In-Context Learning (ICL) is an important paradigm for adapting Large Language Models (LLMs) to downstream tasks through a few demonstrations. Despite the great success of ICL, the limitation of the demonstration number may lead to demonstration bias, i.e. the input-label mapping induced by LLMs misunderstands the task's essence. Inspired by human experience, we attempt to mitigate such bias through the perspective of the inter-demonstration relationship. Specifically, we construct Comparable Demonstrations (CDs) by minimally editing the texts to flip the corresponding labels, in order to highlight the task's essence and eliminate potential spurious correlations through the inter-demonstration comparison. Through a series of experiments on CDs, we find that (1) demonstration bias does exist in LLMs, and CDs can significantly reduce such bias; (2) CDs exhibit good performance in ICL, especially in out-of-distribution scenarios. In summary, this study explores the ICL mechanisms from a novel perspective, providing a deeper insight into the demonstration selection strategy for ICL.
AIDec 14, 2023
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgentHaoran Liao, Qinyi Du, Shaohua Hu et al.
Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.
LGDec 25, 2023
Swap-based Deep Reinforcement Learning for Facility Location Problems in NetworksWenxuan Guo, Yanyan Xu, Yaohui Jin
Facility location problems on graphs are ubiquitous in real world and hold significant importance, yet their resolution is often impeded by NP-hardness. Recently, machine learning methods have been proposed to tackle such classical problems, but they are limited to the myopic constructive pattern and only consider the problems in Euclidean space. To overcome these limitations, we propose a general swap-based framework that addresses the p-median problem and the facility relocation problem on graphs and a novel reinforcement learning model demonstrating a keen awareness of complex graph structures. Striking a harmonious balance between solution quality and running time, our method surpasses handcrafted heuristics on intricate graph datasets. Additionally, we introduce a graph generation process to simulate real-world urban road networks with demand, facilitating the construction of large datasets for the classic problem. For the initialization of the locations of facilities, we introduce a physics-inspired strategy for the p-median problem, reaching more stable solutions than the random strategy. The proposed pipeline coupling the classic swap-based method with deep reinforcement learning marks a significant step forward in addressing the practical challenges associated with facility location on graphs.
CVAug 3, 2025
Context Guided Transformer Entropy Modeling for Video CompressionJunlong Tong, Wei Zhang, Yaohui Jin et al.
Conditional entropy models effectively leverage spatio-temporal contexts to reduce video redundancy. However, incorporating temporal context often introduces additional model complexity and increases computational cost. In parallel, many existing spatial context models lack explicit modeling the ordering of spatial dependencies, which may limit the availability of relevant context during decoding. To address these issues, we propose the Context Guided Transformer (CGT) entropy model, which estimates probability mass functions of the current frame conditioned on resampled temporal context and dependency-weighted spatial context. A temporal context resampler learns predefined latent queries to extract critical temporal information using transformer encoders, reducing downstream computational overhead. Meanwhile, a teacher-student network is designed as dependency-weighted spatial context assigner to explicitly model the dependency of spatial context order. The teacher generates an attention map to represent token importance and an entropy map to reflect prediction certainty from randomly masked inputs, guiding the student to select the weighted top-k tokens with the highest spatial dependency. During inference, only the student is used to predict undecoded tokens based on high-dependency context. Experimental results demonstrate that our CGT model reduces entropy modeling time by approximately 65% and achieves an 11% BD-Rate reduction compared to the previous state-of-the-art conditional entropy model.
CVAug 8, 2025
PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic ManipulationZhihao Zhu, Yifan Zheng, Siyu Pan et al.
The fragmentation between high-level task semantics and low-level geometric features remains a persistent challenge in robotic manipulation. While vision-language models (VLMs) have shown promise in generating affordance-aware visual representations, the lack of semantic grounding in canonical spaces and reliance on manual annotations severely limit their ability to capture dynamic semantic-affordance relationships. To address these, we propose Primitive-Aware Semantic Grounding (PASG), a closed-loop framework that introduces: (1) Automatic primitive extraction through geometric feature aggregation, enabling cross-category detection of keypoints and axes; (2) VLM-driven semantic anchoring that dynamically couples geometric primitives with functional affordances and task-relevant description; (3) A spatial-semantic reasoning benchmark and a fine-tuned VLM (Qwen2.5VL-PA). We demonstrate PASG's effectiveness in practical robotic manipulation tasks across diverse scenarios, achieving performance comparable to manual annotations. PASG achieves a finer-grained semantic-affordance understanding of objects, establishing a unified paradigm for bridging geometric primitives with task semantics in robotic manipulation.
AIJan 11, 2025
Where to Go Next Day: Multi-scale Spatial-Temporal Decoupled Model for Mid-term Human Mobility PredictionZongyuan Huang, Weipeng Wang, Shaoyu Huang et al.
Predicting individual mobility patterns is crucial across various applications. While current methods mainly focus on predicting the next location for personalized services like recommendations, they often fall short in supporting broader applications such as traffic management and epidemic control, which require longer period forecasts of human mobility. This study addresses mid-term mobility prediction, aiming to capture daily travel patterns and forecast trajectories for the upcoming day or week. We propose a novel Multi-scale Spatial-Temporal Decoupled Predictor (MSTDP) designed to efficiently extract spatial and temporal information by decoupling daily trajectories into distinct location-duration chains. Our approach employs a hierarchical encoder to model multi-scale temporal patterns, including daily recurrence and weekly periodicity, and utilizes a transformer-based decoder to globally attend to predicted information in the location or duration chain. Additionally, we introduce a spatial heterogeneous graph learner to capture multi-scale spatial relationships, enhancing semantic-rich representations. Extensive experiments, including statistical physics analysis, are conducted on large-scale mobile phone records in five cities (Boston, Los Angeles, SF Bay Area, Shanghai, and Tokyo), to demonstrate MSTDP's advantages. Applied to epidemic modeling in Boston, MSTDP significantly outperforms the best-performing baseline, achieving a remarkable 62.8% reduction in MAE for cumulative new cases.
AIDec 26, 2024
TrajGEOS: Trajectory Graph Enhanced Orientation-based Sequential Network for Mobility PredictionZhaoping Hu, Zongyuan Huang, Jinming Yang et al.
Human mobility studies how people move to access their needed resources and plays a significant role in urban planning and location-based services. As a paramount task of human mobility modeling, next location prediction is challenging because of the diversity of users' historical trajectories that gives rise to complex mobility patterns and various contexts. Deep sequential models have been widely used to predict the next location by leveraging the inherent sequentiality of trajectory data. However, they do not fully leverage the relationship between locations and fail to capture users' multi-level preferences. This work constructs a trajectory graph from users' historical traces and proposes a \textbf{Traj}ectory \textbf{G}raph \textbf{E}nhanced \textbf{O}rientation-based \textbf{S}equential network (TrajGEOS) for next-location prediction tasks. TrajGEOS introduces hierarchical graph convolution to capture location and user embeddings. Such embeddings consider not only the contextual feature of locations but also the relation between them, and serve as additional features in downstream modules. In addition, we design an orientation-based module to learn users' mid-term preferences from sequential modeling modules and their recent trajectories. Extensive experiments on three real-world LBSN datasets corroborate the value of graph and orientation-based modules and demonstrate that TrajGEOS outperforms the state-of-the-art methods on the next location prediction task.
CLMay 9, 2024
Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language ModelsYitian Li, Jidong Tian, Hao He et al.
Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.
CLMay 8, 2024
Logical Negation Augmenting and Debiasing for Prompt-based MethodsYitian Li, Jidong Tian, Hao He et al.
Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical negation. Based on our analysis, logical negation tends to result in spurious correlations to negative answers, while propositions without logical negation correlate to positive answers. To solve the problem, we propose a simple but effective method, Negation Augmenting and Negation Debiasing (NAND), which introduces negative propositions to prompt-based methods without updating parameters. Specifically, these negative propositions can counteract spurious correlations by providing "not" for all instances so that models cannot make decisions only by whether expressions contain a logical negation. Experiments on three datasets show that NAND not only solves the problem of calibrating logical negation but also significantly enhances prompt-based methods of logical reasoning without model retraining.
CLFeb 24, 2024
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language ModelsHaoran Liao, Jidong Tian, Shaohua Hu et al.
Large language models (LLMs) still grapple with complex tasks like mathematical reasoning. Despite significant efforts invested in improving prefix prompts or reasoning process, the crucial role of problem context might have been neglected. Accurate recognition of inputs is fundamental for solving mathematical tasks, as ill-formed problems could potentially mislead LLM's reasoning. In this study, we propose a new approach named Problem Elaboration Prompting (PEP) to enhance the mathematical capacities of LLMs. Specifically, PEP decomposes and elucidates the problem context before reasoning, therefore enhancing the context modeling and parsing efficiency. Experiments across datasets and models demonstrate promising performances: (1) PEP demonstrates an overall enhancement in various mathematical tasks. For instance, with the GPT-3.5 model, PEP exhibits improvements of 9.93% and 8.80% on GSM8k through greedy decoding and self-consistency, respectively. (2) PEP can be easily implemented and integrated with other prompting methods. (3) PEP shows particular strength in handling distraction problems.
CLMar 31, 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech DatasetZehui Yang, Yifan Chen, Lei Luo et al.
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided. As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc. We also conduct several relevant tasks and provide experimental results to help evaluate the dataset.
CLSep 12, 2021
End-to-End Conversational Search for Online Shopping with Utterance TransferLiqiang Xiao, Jun Ma2, Xin Luna Dong et al.
Successful conversational search systems can present natural, adaptive and interactive shopping experience for online shopping customers. However, building such systems from scratch faces real word challenges from both imperfect product schema/knowledge and lack of training dialog data.In this work we first propose ConvSearch, an end-to-end conversational search system that deeply combines the dialog system with search. It leverages the text profile to retrieve products, which is more robust against imperfect product schema/knowledge compared with using product attributes alone. We then address the lack of data challenges by proposing an utterance transfer approach that generates dialogue utterances by using existing dialog from other domains, and leveraging the search behavior data from e-commerce retailer. With utterance transfer, we introduce a new conversational search dataset for online shopping. Experiments show that our utterance transfer method can significantly improve the availability of training dialogue data without crowd-sourcing, and the conversational search system significantly outperformed the best tested baseline.
LGSep 10, 2021
Enhancing Unsupervised Anomaly Detection with Score-Guided NetworkZongyuan Huang, Baohua Zhang, Guoqiang Hu et al.
Anomaly detection plays a crucial role in various real-world applications, including healthcare and finance systems. Owing to the limited number of anomaly labels in these complex systems, unsupervised anomaly detection methods have attracted great attention in recent years. Two major challenges faced by the existing unsupervised methods are: (i) distinguishing between normal and abnormal data in the transition field, where normal and abnormal data are highly mixed together; (ii) defining an effective metric to maximize the gap between normal and abnormal data in a hypothesis space, which is built by a representation learner. To that end, this work proposes a novel scoring network with a score-guided regularization to learn and enlarge the anomaly score disparities between normal and abnormal data. With such score-guided strategy, the representation learner can gradually learn more informative representation during the model training stage, especially for the samples in the transition field. We next propose a score-guided autoencoder (SG-AE), incorporating the scoring network into an autoencoder framework for anomaly detection, as well as other three state-of-the-art models, to further demonstrate the effectiveness and transferability of the design. Extensive experiments on both synthetic and real-world datasets demonstrate the state-of-the-art performance of these score-guided models (SGMs).
LGMay 18, 2021
Dependent Multi-Task Learning with Causal Intervention for Image CaptioningWenqing Chen, Jidong Tian, Caoyun Fan et al.
Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would miss parts of important information. From a causal perspective, the reason is that models have captured spurious statistical correlations between visual features and certain expressions (e.g., visual features of "long hair" and "woman"). In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI). Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning. The intermediate task would help the model better understand the visual features and thus alleviate the content inconsistency problem. Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders and thus letting models focus on the causal visual features. Specifically, the high-frequency concept set is considered as the proxy confounders where the real confounders are inferred in the continuous space. Finally, we use a multi-agent reinforcement learning (MARL) strategy to enable end-to-end training and reduce the inter-task error accumulations. The extensive experiments show that our model outperforms the baseline models and achieves competitive performance with state-of-the-art models.
LGFeb 2, 2021
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-EncoderLongyuan Li, Junchi Yan, Haiyang Wang et al.
Deep generative models have demonstrated their effectiveness in learning latent representation and modeling complex dependencies of time series. In this paper, we present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of multi-dimensional time series. Our model is based on Variational Auto-Encoder (VAE), and its backbone is fulfilled by a Recurrent Neural Network to capture latent temporal structures of time series for both generative model and inference model. Specifically, our model parameterizes mean and variance for each time-stamp with flexible neural networks, resulting in a non-stationary model that can work without the assumption of constant noise as commonly made by existing Markov models. However, such a flexibility may cause the model fragile to anomalies. To achieve robust density estimation which can also benefit detection tasks, we propose a smoothness-inducing prior over possible estimations. The proposed prior works as a regularizer that places penalty at non-smooth reconstructions. Our model is learned efficiently with a novel stochastic gradient variational Bayes estimator. In particular, we study two decision criteria for anomaly detection: reconstruction probability and reconstruction error. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
LGJan 31, 2021
Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic ForecastingLongyuan Li, Jihai Zhang, Junchi Yan et al.
Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.
LGJan 31, 2021
Learning Interpretable Deep State Space Model for Probabilistic Time Series ForecastingLongyuan Li, Junchi Yan, Xiaokang Yang et al.
Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets. We take the automatic relevance determination (ARD) view and devise a network to exploit the exogenous variables in addition to time series. In particular, our ARD network can incorporate the uncertainty of the exogenous variables and eventually helps identify useful exogenous variables and suppress those irrelevant for forecasting. The distribution of multi-step ahead forecasts are approximated by Monte Carlo simulation. We show in experiments that our model produces accurate and sharp probabilistic forecasts. The estimated uncertainty of our forecasting also realistically increases over time, in a spontaneous manner.
CVJul 2, 2019
Disentangled Makeup Transfer with Generative Adversarial NetworkHonglun Zhang, Wenqing Chen, Hao He et al.
Facial makeup transfer is a widely-used technology that aims to transfer the makeup style from a reference face image to a non-makeup face. Existing literature leverage the adversarial loss so that the generated faces are of high quality and realistic as real ones, but are only able to produce fixed outputs. Inspired by recent advances in disentangled representation, in this paper we propose DMT (Disentangled Makeup Transfer), a unified generative adversarial network to achieve different scenarios of makeup transfer. Our model contains an identity encoder as well as a makeup encoder to disentangle the personal identity and the makeup style for arbitrary face images. Based on the outputs of the two encoders, a decoder is employed to reconstruct the original faces. We also apply a discriminator to distinguish real faces from fake ones. As a result, our model can not only transfer the makeup styles from one or more reference face images to a non-makeup face with controllable strength, but also produce various outputs with styles sampled from a prior distribution. Extensive experiments demonstrate that our model is superior to existing literature by generating high-quality results for different scenarios of makeup transfer.
CVNov 19, 2018
Show, Attend and Translate: Unpaired Multi-Domain Image-to-Image Translation with Visual AttentionHonglun Zhang, Wenqing Chen, Jidong Tian et al.
Recently unpaired multi-domain image-to-image translation has attracted great interests and obtained remarkable progress, where a label vector is utilized to indicate multi-domain information. In this paper, we propose SAT (Show, Attend and Translate), an unified and explainable generative adversarial network equipped with visual attention that can perform unpaired image-to-image translation for multiple domains. By introducing an action vector, we treat the original translation tasks as problems of arithmetic addition and subtraction. Visual attention is applied to guarantee that only the regions relevant to the target domains are translated. Extensive experiments on a facial attribute dataset demonstrate the superiority of our approach and the generated attention masks better explain what SAT attends when translating images.
CLOct 17, 2017
Multi-Task Label Embedding for Text ClassificationHonglun Zhang, Liqiang Xiao, Wenqing Chen et al.
Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless one-hot vectors, which cause a loss of potential information and makes it difficult for these models to jointly learn three or more tasks. In this paper, we propose Multi-Task Label Embedding to convert labels in text classification into semantic vectors, thereby turning the original tasks into vector matching tasks. We implement unsupervised, supervised and semi-supervised models of Multi-Task Label Embedding, all utilizing semantic correlations among tasks and making it particularly convenient to scale and transfer as more tasks are involved. Extensive experiments on five benchmark datasets for text classification show that our models can effectively improve performances of related tasks with semantic representations of labels and additional information from each other.
CLJul 10, 2017
A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task LearningHonglun Zhang, Liqiang Xiao, Yongkun Wang et al.
Multi-task learning leverages potential correlations among related tasks to extract common features and yield performance gains. However, most previous works only consider simple or weak interactions, thereby failing to model complex correlations among three or more tasks. In this paper, we propose a multi-task learning architecture with four types of recurrent neural layers to fuse information across multiple related tasks. The architecture is structurally flexible and considers various interactions among tasks, which can be regarded as a generalized case of many previous works. Extensive experiments on five benchmark datasets for text classification show that our model can significantly improve performances of related tasks with additional information from others.
CYApr 27, 2017
OMNIRank: Risk Quantification for P2P Platforms with Deep LearningHonglun Zhang, Haiyang Wang, Xiaming Chen et al.
P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
IRJul 20, 2016
A Local-Global LDA Model for Discovering Geographical Topics from Social MediaSiwei Qiang, Yongkun Wang, Yaohui Jin
Micro-blogging services can track users' geo-locations when users check-in their places or use geo-tagging which implicitly reveals locations. This "geo tracking" can help to find topics triggered by some events in certain regions. However, discovering such topics is very challenging because of the large amount of noisy messages (e.g. daily conversations). This paper proposes a method to model geographical topics, which can filter out irrelevant words by different weights in the local and global contexts. Our method is based on the Latent Dirichlet Allocation (LDA) model but each word is generated from either a local or a global topic distribution by its generation probabilities. We evaluated our model with data collected from Weibo, which is currently the most popular micro-blogging service for Chinese. The evaluation results demonstrate that our method outperforms other baseline methods in several metrics such as model perplexity, two kinds of entropies and KL-divergence of discovered topics.