CLApr 10, 2025
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement LearningByteDance Seed, Jiaze Chen, Tiantian Fan et al. · bytedance
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.
LGOct 26, 2022
Learning on Large-scale Text-attributed Graphs via Variational InferenceJianan Zhao, Meng Qu, Chaozhuo Li et al.
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.
LGApr 23, 2023
TGNN: A Joint Semi-supervised Framework for Graph-level ClassificationWei Ju, Xiao Luo, Meng Qu et al.
This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are far from satisfactory due to their insufficient topology exploration of unlabeled data. We address the challenge by proposing a novel semi-supervised framework called Twin Graph Neural Network (TGNN). To explore graph structural information from complementary views, our TGNN has a message passing module and a graph kernel module. To fully utilize unlabeled data, for each module, we calculate the similarity of each unlabeled graph to other labeled graphs in the memory bank and our consistency loss encourages consistency between two similarity distributions in different embedding spaces. The two twin modules collaborate with each other by exchanging instance similarity knowledge to fully explore the structure information of both labeled and unlabeled data. We evaluate our TGNN on various public datasets and show that it achieves strong performance.
LGMay 21, 2022
KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph ClassificationWei Ju, Junwei Yang, Meng Qu et al.
This paper studies semi-supervised graph classification, which is an important problem with various applications in social network analysis and bioinformatics. This problem is typically solved by using graph neural networks (GNNs), which yet rely on a large number of labeled graphs for training and are unable to leverage unlabeled graphs. We address the limitations by proposing the Kernel-based Graph Neural Network (KGNN). A KGNN consists of a GNN-based network as well as a kernel-based network parameterized by a memory network. The GNN-based network performs classification through learning graph representations to implicitly capture the similarity between query graphs and labeled graphs, while the kernel-based network uses graph kernels to explicitly compare each query graph with all the labeled graphs stored in a memory for prediction. The two networks are motivated from complementary perspectives, and thus combing them allows KGNN to use labeled graphs more effectively. We jointly train the two networks by maximizing their agreement on unlabeled graphs via posterior regularization, so that the unlabeled graphs serve as a bridge to let both networks mutually enhance each other. Experiments on a range of well-known benchmark datasets demonstrate that KGNN achieves impressive performance over competitive baselines.
LGApr 15, 2022
Neural Structured Prediction for Inductive Node ClassificationMeng Qu, Huiyu Cai, Jian Tang
This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
CLOct 2, 2023
GraphText: Graph Reasoning in Text SpaceJianan Zhao, Le Zhuo, Yikang Shen et al.
Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both humans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.
BMFeb 22, 2022Code
Structured Multi-task Learning for Molecular Property PredictionShengchao Liu, Meng Qu, Zuobai Zhang et al.
Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available. We first construct a dataset (ChEMBL-STRING) including around 400 tasks as well as a task relation graph. Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives. (1) In the \emph{latent} space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph. (2) In the \emph{output} space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach. Empirical results justify the effectiveness of SGNN-EBM. Code is available on https://github.com/chao1224/SGNN-EBM.
AIAug 31, 2019Code
Collaborative Policy Learning for Open Knowledge Graph ReasoningCong Fu, Tong Chen, Meng Qu et al.
In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning---a task that aims to reason for missing facts over a graph augmented by a background text corpus. A key challenge of the task is to filter out "irrelevant" facts extracted from corpus, in order to maintain an effective search space during path inference. We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor. The fact extraction agent generates fact triples from corpora to enrich the graph on the fly; while the reasoning agent provides feedback to the fact extractor and guides it towards promoting facts that are helpful for the interpretable reasoning. Experiments on two public datasets demonstrate the effectiveness of the proposed approach. Source code and datasets used in this paper can be downloaded at https://github.com/shanzhenren/CPL
LGApr 11, 2019Code
Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge GraphsWoojeong Jin, Meng Qu, Xisen Jin et al.
Knowledge graph reasoning is a critical task in natural language processing. The task becomes more challenging on temporal knowledge graphs, where each fact is associated with a timestamp. Most existing methods focus on reasoning at past timestamps and they are not able to predict facts happening in the future. This paper proposes Recurrent Event Network (RE-NET), a novel autoregressive architecture for predicting future interactions. The occurrence of a fact (event) is modeled as a probability distribution conditioned on temporal sequences of past knowledge graphs. Specifically, our RE-NET employs a recurrent event encoder to encode past facts and uses a neighborhood aggregator to model the connection of facts at the same timestamp. Future facts can then be inferred in a sequential manner based on the two modules. We evaluate our proposed method via link prediction at future times on five public datasets. Through extensive experiments, we demonstrate the strength of RENET, especially on multi-step inference over future timestamps, and achieve state-of-the-art performance on all five datasets. Code and data can be found at https://github.com/INK-USC/RE-Net.
CLFeb 20, 2019Code
Learning Dual Retrieval Module for Semi-supervised Relation ExtractionHongtao Lin, Jun Yan, Meng Qu et al.
Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision---where only a limited number of labeled sentences are given and a large number of unlabeled sentences are available. Most existing work exploits unlabeled data based on the ideas of self-training (i.e., bootstrapping a model) and multi-view learning (e.g., ensembling multiple model variants). However, these methods either suffer from the issue of semantic drift, or do not fully capture the problem characteristics of relation extraction. In this paper, we leverage a key insight that retrieving sentences expressing a relation is a dual task of predicting relation label for a given sentence---two tasks are complementary to each other and can be optimized jointly for mutual enhancement. To model this intuition, we propose DualRE, a principled framework that introduces a retrieval module which is jointly trained with the original relation prediction module. In this way, high-quality samples selected by retrieval module from unlabeled data can be used to improve prediction module, and vice versa. Experimental results\footnote{\small Code and data can be found at \url{https://github.com/INK-USC/DualRE}.} on two public datasets as well as case studies demonstrate the effectiveness of the DualRE approach.
CYFeb 6, 2024
Risks of AI Scientists: Prioritizing Safeguarding Over AutonomyXiangru Tang, Qiao Jin, Kunlun Zhu et al.
AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding AI scientists and advocate for the development of improved models, robust benchmarks, and comprehensive regulations.
LGMay 17, 2025
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement LearningChenwei Lou, Zewei Sun, Xinnian Liang et al.
Large Language Models (LLMs) have demonstrated remarkable capabilities but often face challenges with tasks requiring sophisticated reasoning. While Chain-of-Thought (CoT) prompting significantly enhances reasoning, it indiscriminately generates lengthy reasoning steps for all queries, leading to substantial computational costs and inefficiency, especially for simpler inputs. To address this critical issue, we introduce AdaCoT (Adaptive Chain-of-Thought), a novel framework enabling LLMs to adaptively decide when to invoke CoT. AdaCoT framed adaptive reasoning as a Pareto optimization problem that seeks to balance model performance with the costs associated with CoT invocation (both frequency and computational overhead). We propose a reinforcement learning (RL) based method, specifically utilizing Proximal Policy Optimization (PPO), to dynamically control the CoT triggering decision boundary by adjusting penalty coefficients, thereby allowing the model to determine CoT necessity based on implicit query complexity. A key technical contribution is Selective Loss Masking (SLM), designed to counteract decision boundary collapse during multi-stage RL training, ensuring robust and stable adaptive triggering. Experimental results demonstrate that AdaCoT successfully navigates the Pareto frontier, achieving substantial reductions in CoT usage for queries not requiring elaborate reasoning. For instance, on our production traffic testset, AdaCoT reduced CoT triggering rates to as low as 3.18\% and decreased average response tokens by 69.06%, while maintaining high performance on complex tasks.
LGFeb 16, 2022
TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug DiscoveryZhaocheng Zhu, Chence Shi, Zuobai Zhang et al.
Machine learning has huge potential to revolutionize the field of drug discovery and is attracting increasing attention in recent years. However, lacking domain knowledge (e.g., which tasks to work on), standard benchmarks and data preprocessing pipelines are the main obstacles for machine learning researchers to work in this domain. To facilitate the progress of machine learning for drug discovery, we develop TorchDrug, a powerful and flexible machine learning platform for drug discovery built on top of PyTorch. TorchDrug benchmarks a variety of important tasks in drug discovery, including molecular property prediction, pretrained molecular representations, de novo molecular design and optimization, retrosynthsis prediction, and biomedical knowledge graph reasoning. State-of-the-art techniques based on geometric deep learning (or graph machine learning), deep generative models, reinforcement learning and knowledge graph reasoning are implemented for these tasks. TorchDrug features a hierarchical interface that facilitates customization from both novices and experts in this domain. Tutorials, benchmark results and documentation are available at https://torchdrug.ai. Code is released under Apache License 2.0.
CYOct 30, 2020
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact TracingPrateek Gupta, Tegan Maharaj, Martin Weiss et al.
The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental simulator we call COVI-AgentSim, integrating detailed consideration of virology, disease progression, social contact networks, and mobility patterns, based on parameters derived from empirical research. We verify by comparing to real data that COVI-AgentSim is able to reproduce realistic COVID-19 spread dynamics, and perform a sensitivity analysis to verify that the relative performance of contact tracing methods are consistent across a range of settings. We use COVI-AgentSim to perform cost-benefit analyses comparing no DCT to: 1) standard binary contact tracing (BCT) that assigns binary recommendations based on binary test results; and 2) a rule-based method for feature-based contact tracing (FCT) that assigns a graded level of recommendation based on diverse individual features. We find all DCT methods consistently reduce the spread of the disease, and that the advantage of FCT over BCT is maintained over a wide range of adoption rates. Feature-based methods of contact tracing avert more disability-adjusted life years (DALYs) per socioeconomic cost (measured by productive hours lost). Our results suggest any DCT method can help save lives, support re-opening of economies, and prevent second-wave outbreaks, and that FCT methods are a promising direction for enriching BCT using self-reported symptoms, yielding earlier warning signals and a significantly reduced spread of the virus per socioeconomic cost.
LGOct 23, 2020
Predicting Infectiousness for Proactive Contact TracingYoshua Bengio, Prateek Gupta, Tegan Maharaj et al.
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
AIOct 8, 2020
RNNLogic: Learning Logic Rules for Reasoning on Knowledge GraphsMeng Qu, Junkun Chen, Louis-Pascal Xhonneux et al.
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is first updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.
LGJul 5, 2020
Few-shot Relation Extraction via Bayesian Meta-learning on Relation GraphsMeng Qu, Tianyu Gao, Louis-Pascal A. C. Xhonneux et al.
This paper studies few-shot relation extraction, which aims at predicting the relation for a pair of entities in a sentence by training with a few labeled examples in each relation. To more effectively generalize to new relations, in this paper we study the relationships between different relations and propose to leverage a global relation graph. We propose a novel Bayesian meta-learning approach to effectively learn the posterior distribution of the prototype vectors of relations, where the initial prior of the prototype vectors is parameterized with a graph neural network on the global relation graph. Moreover, to effectively optimize the posterior distribution of the prototype vectors, we propose to use the stochastic gradient Langevin dynamics, which is related to the MAML algorithm but is able to handle the uncertainty of the prototype vectors. The whole framework can be effectively and efficiently optimized in an end-to-end fashion. Experiments on two benchmark datasets prove the effectiveness of our proposed approach against competitive baselines in both the few-shot and zero-shot settings.
LGJun 24, 2020
Graph Policy Network for Transferable Active Learning on GraphsShengding Hu, Zheng Xiong, Meng Qu et al.
Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the annotation cost of training GNNs. We formulate the problem as a sequential decision process on graphs and train a GNN-based policy network with reinforcement learning to learn the optimal query strategy. By jointly training on several source graphs with full labels, we learn a transferable active learning policy which can directly generalize to unlabeled target graphs. Experimental results on multiple datasets from different domains prove the effectiveness of the learned policy in promoting active learning performance in both settings of transferring between graphs in the same domain and across different domains.
CRMay 18, 2020
COVI White PaperHannah Alsdurf, Edmond Belliveau, Yoshua Bengio et al.
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile apps has the potential to shift the paradigm. Some countries have deployed centralized tracking systems, but more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or for-profit corporations. Machine learning methods can circumvent some of the limitations of standard digital tracing by incorporating many clues and their uncertainty into a more graded and precise estimation of infection risk. The estimated risk can provide early risk awareness, personalized recommendations and relevant information to the user. Finally, non-identifying risk data can inform epidemiological models trained jointly with the machine learning predictor. These models can provide statistical evidence for the importance of factors involved in disease transmission. They can also be used to monitor, evaluate and optimize health policy and (de)confinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of `COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
LGDec 2, 2019
Continuous Graph Neural NetworksLouis-Pascal A. C. Xhonneux, Meng Qu, Jian Tang
This paper builds on the connection between graph neural networks and traditional dynamical systems. We propose continuous graph neural networks (CGNN), which generalise existing graph neural networks with discrete dynamics in that they can be viewed as a specific discretisation scheme. The key idea is how to characterise the continuous dynamics of node representations, i.e. the derivatives of node representations, w.r.t. time. Inspired by existing diffusion-based methods on graphs (e.g. PageRank and epidemic models on social networks), we define the derivatives as a combination of the current node representations, the representations of neighbors, and the initial values of the nodes. We propose and analyse two possible dynamics on graphs---including each dimension of node representations (a.k.a. the feature channel) change independently or interact with each other---both with theoretical justification. The proposed continuous graph neural networks are robust to over-smoothing and hence allow us to build deeper networks, which in turn are able to capture the long-range dependencies between nodes. Experimental results on the task of node classification demonstrate the effectiveness of our proposed approach over competitive baselines.
LGSep 25, 2019
GraphMix: Improved Training of GNNs for Semi-Supervised LearningVikas Verma, Meng Qu, Kenji Kawaguchi et al.
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.
LGJul 6, 2019
Weakly-supervised Knowledge Graph Alignment with Adversarial LearningMeng Qu, Jian Tang, Yoshua Bengio
This paper studies aligning knowledge graphs from different sources or languages. Most existing methods train supervised methods for the alignment, which usually require a large number of aligned knowledge triplets. However, such a large number of aligned knowledge triplets may not be available or are expensive to obtain in many domains. Therefore, in this paper we propose to study aligning knowledge graphs in fully-unsupervised or weakly-supervised fashion, i.e., without or with only a few aligned triplets. We propose an unsupervised framework to align the entity and relation embddings of different knowledge graphs with an adversarial learning framework. Moreover, a regularization term which maximizes the mutual information between the embeddings of different knowledge graphs is used to mitigate the problem of mode collapse when learning the alignment functions. Such a framework can be further seamlessly integrated with existing supervised methods by utilizing a limited number of aligned triples as guidance. Experimental results on multiple datasets prove the effectiveness of our proposed approach in both the unsupervised and the weakly-supervised settings.
LGJun 20, 2019
Probabilistic Logic Neural Networks for ReasoningMeng Qu, Jian Tang
Knowledge graph reasoning, which aims at predicting the missing facts through reasoning with the observed facts, is critical to many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle their uncertainty. However, the inference of MLNs is usually very difficult due to the complicated graph structures. Different from MLNs, knowledge graph embedding methods (e.g. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient. However, they are unable to leverage domain knowledge. In this paper, we propose the probabilistic Logic Neural Network (pLogicNet), which combines the advantages of both methods. A pLogicNet defines the joint distribution of all possible triplets by using a Markov logic network with first-order logic, which can be efficiently optimized with the variational EM algorithm. In the E-step, a knowledge graph embedding model is used for inferring the missing triplets, while in the M-step, the weights of logic rules are updated based on both the observed and predicted triplets. Experiments on multiple knowledge graphs prove the effectiveness of pLogicNet over many competitive baselines.
SIJun 18, 2019
vGraph: A Generative Model for Joint Community Detection and Node Representation LearningFan-Yun Sun, Meng Qu, Jordan Hoffmann et al.
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.
LGMay 15, 2019
GMNN: Graph Markov Neural NetworksMeng Qu, Yoshua Bengio, Jian Tang
This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.
LGMar 2, 2019
GraphVite: A High-Performance CPU-GPU Hybrid System for Node EmbeddingZhaocheng Zhu, Shizhen Xu, Meng Qu et al.
Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.
CLNov 9, 2017
Weakly-supervised Relation Extraction by Pattern-enhanced Embedding LearningMeng Qu, Xiang Ren, Yu Zhang et al.
Extracting relations from text corpora is an important task in text mining. It becomes particularly challenging when focusing on weakly-supervised relation extraction, that is, utilizing a few relation instances (i.e., a pair of entities and their relation) as seeds to extract more instances from corpora. Existing distributional approaches leverage the corpus-level co-occurrence statistics of entities to predict their relations, and require large number of labeled instances to learn effective relation classifiers. Alternatively, pattern-based approaches perform bootstrapping or apply neural networks to model the local contexts, but still rely on large number of labeled instances to build reliable models. In this paper, we study integrating the distributional and pattern-based methods in a weakly-supervised setting, such that the two types of methods can provide complementary supervision for each other to build an effective, unified model. We propose a novel co-training framework with a distributional module and a pattern module. During training, the distributional module helps the pattern module discriminate between the informative patterns and other patterns, and the pattern module generates some highly-confident instances to improve the distributional module. The whole framework can be effectively optimized by iterating between improving the pattern module and updating the distributional module. We conduct experiments on two tasks: knowledge base completion with text corpora and corpus-level relation extraction. Experimental results prove the effectiveness of our framework in the weakly-supervised setting.
SISep 19, 2017
An Attention-based Collaboration Framework for Multi-View Network Representation LearningMeng Qu, Jian Tang, Jingbo Shang et al.
Learning distributed node representations in networks has been attracting increasing attention recently due to its effectiveness in a variety of applications. Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network. However, in reality there usually exists multiple types of proximities between nodes, yielding networks with multiple views. This paper studies learning node representations for networks with multiple views, which aims to infer robust node representations across different views. We propose a multi-view representation learning approach, which promotes the collaboration of different views and lets them vote for the robust representations. During the voting process, an attention mechanism is introduced, which enables each node to focus on the most informative views. Experimental results on real-world networks show that the proposed approach outperforms existing state-of-the-art approaches for network representation learning with a single view and other competitive approaches with multiple views.
CLJun 25, 2017
Automatic Synonym Discovery with Knowledge BasesMeng Qu, Xiang Ren, Jiawei Han
Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.
CLNov 29, 2016
Identity-sensitive Word Embedding through Heterogeneous NetworksJian Tang, Meng Qu, Qiaozhu Mei
Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identity-sensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation.
SIOct 31, 2016
Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information NetworksJingbo Shang, Meng Qu, Jialu Liu et al.
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing approaches neither explore rich semantic information embedded in the network structures nor take user's preference as a guidance. In this paper, we re-examine similarity search in HINs and propose a novel embedding-based framework. It models vertices as low-dimensional vectors to explore network structure-embedded similarity. To accommodate user preferences at defining similarity semantics, our proposed framework, ESim, accepts user-defined meta-paths as guidance to learn vertex vectors in a user-preferred embedding space. Moreover, an efficient and parallel sampling-based optimization algorithm has been developed to learn embeddings in large-scale HINs. Extensive experiments on real-world large-scale HINs demonstrate a significant improvement on the effectiveness of ESim over several state-of-the-art algorithms as well as its scalability.
CLOct 27, 2016
CoType: Joint Extraction of Typed Entities and Relations with Knowledge BasesXiang Ren, Zeqiu Wu, Wenqi He et al.
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method.
CLFeb 17, 2016
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label EmbeddingXiang Ren, Wenqi He, Meng Qu et al.
Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.
CLAug 2, 2015
PTE: Predictive Text Embedding through Large-scale Heterogeneous Text NetworksJian Tang, Meng Qu, Qiaozhu Mei
Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the \textit{predictive text embedding} (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.
LGMar 12, 2015
LINE: Large-scale Information Network EmbeddingJian Tang, Meng Qu, Mingzhe Wang et al.
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.