Tong Chen

h-index4

7papers

330citations

Novelty52%

AI Score42

Ranked #62,268 of 194,257 authors (top 32%)#12,149 in CL (top 39%)

7 Papers

20.4CLJul 9, 2024Code

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Tong Chen, Akari Asai, Niloofar Mireshghallah et al.

Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2\% to 10.5\% and non-literal copying from 2.3\% to 5.9\% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying.

17.6LGNov 8, 2024

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Tong Chen, Hao Fang, Patrick Xia et al.

Large language models (LMs) are typically adapted to improve performance on new contexts (\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce $GenerativeAdapter$, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning. The adapter generator is trained via self-supervised learning, and can be used to adapt a single frozen LM for any new task simply by mapping the associated task or domain context to a new adapter. We apply $GenerativeAdapter$ to two pretrained LMs (Mistral-7B-Instruct and Llama2-7B-Chat) and evaluate the adapted models in three adaption scenarios: knowledge acquisition from documents, learning from demonstrations, and personalization for users. In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63.5% improvement in F1 score over the model with supervised fine-tuning (from $19.5$ to $31.5$) for contexts as long as 32K tokens. In the MetaICL in-context learning evaluation, our method achieves an average accuracy of $44.9$ across 26 tasks, outperforming the base model. On MSC, our method proves to be highly competitive in memorizing user information from conversations with a 4x reduction in computation and memory costs compared to prompting with full conversation history. Together, these results suggest that $GenerativeAdapter$ should allow for general adaption to a wide range of different contexts.

6.7CLOct 20, 2025

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

Tong Chen, Akari Asai, Luke Zettlemoyer et al.

Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach assigns a reward of one only when the model's output is entirely factually correct, and zero otherwise. We evaluate our method on Qwen3 reasoning models across diverse tasks. For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates, substantially outperforming both supervised training and continuous-reward RL baselines. In short-form question answering, the model learns calibrated abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge. This yields 44.4% and 21.7% fewer incorrect answers on PopQA and GPQA, respectively. Crucially, these factuality gains come without performance degradation on instruction following, math, or code, whereas continuous-reward RL, despite improving factuality, induces quality regressions.

15.6IRMar 24, 2021

Hierarchical Hyperedge Embedding-based Representation Learning for Group Recommendation

Lei Guo, Hongzhi Yin, Tong Chen et al.

In this work, we study group recommendation in a particular scenario, namely Occasional Group Recommendation (OGR). Most existing works have addressed OGR by aggregating group members' personal preferences to learn the group representation. However, the representation learning for a group is most complex beyond the fusion of group member representation, as the personal preferences and group preferences may be in different spaces. In addition, the learned user representation is not accurate due to the sparsity of users' interaction data. Moreover, the group similarity in terms of common group members has been overlooked, which however has the great potential to improve the group representation learning. In this work, we focus on addressing the above challenges in group representation learning task, and devise a hierarchical hyperedge embedding-based group recommender, namely HyperGroup. Specifically, we propose to leverage the user-user interactions to alleviate the sparsity issue of user-item interactions, and design a GNN-based representation learning network to enhance the learning of individuals' preferences from their friends' preferences, which provides a solid foundation for learning groups' preferences. To exploit the group similarity to learn a more accurate group representation from highly limited group-item interactions, we connect all groups as a network of overlapping sets, and treat the task of group preference learning as embedding hyperedges in a hypergraph, where an inductive hyperedge embedding method is proposed. To further enhance the group-level preference modeling, we develop a joint training strategy to learn both user-item and group-item interactions in the same process. We conduct extensive experiments on two real-world datasets and the experimental results demonstrate the superiority of our proposed HyperGroup in comparison to the state-of-the-art baselines.

0.2CLJan 22, 2021

Knowledge Graph Completion with Text-aided Regularization

Tong Chen, Sirou Zhu, Yiming Wen et al.

Knowledge Graph Completion is a task of expanding the knowledge graph/base through estimating possible entities, or proper nouns, that can be connected using a set of predefined relations, or verb/predicates describing interconnections of two things. Generally, we describe this problem as adding new edges to a current network of vertices and edges. Traditional approaches mainly focus on using the existing graphical information that is intrinsic of the graph and train the corresponding embeddings to describe the information; however, we think that the corpus that are related to the entities should also contain information that can positively influence the embeddings to better make predictions. In our project, we try numerous ways of using extracted or raw textual information to help existing KG embedding frameworks reach better prediction results, in the means of adding a similarity function to the regularization part in the loss function. Results have shown that we have made decent improvements over baseline KG embedding methods.

17.0SIJun 2, 2020

Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction

Hongxu Chen, Hongzhi Yin, Xiangguo Sun et al.

Cross-platform account matching plays a significant role in social network analytics, and is beneficial for a wide range of applications. However, existing methods either heavily rely on high-quality user generated content (including user profiles) or suffer from data insufficiency problem if only focusing on network topology, which brings researchers into an insoluble dilemma of model selection. In this paper, to address this problem, we propose a novel framework that considers multi-level graph convolutions on both local network structure and hypergraph structure in a unified manner. The proposed method overcomes data insufficiency problem of existing work and does not necessarily rely on user demographic information. Moreover, to adapt the proposed method to be capable of handling large-scale social networks, we propose a two-phase space reconciliation mechanism to align the embedding spaces in both network partitioning based parallel training and account matching across different social networks. Extensive experiments have been conducted on two large-scale real-life social networks. The experimental results demonstrate that the proposed method outperforms the state-of-the-art models with a big margin.

0.7LGOct 14, 2017

When Point Process Meets RNNs: Predicting Fine-Grained User Interests with Mutual Behavioral Infectivity

Tong Chen, Lin Wu, Yang Wang et al.

Predicting fine-grained interests of users with temporal behavior is important to personalization and information filtering applications. However, existing interest prediction methods are incapable of capturing the subtle degreed user interests towards particular items, and the internal time-varying drifting attention of individuals is not studied yet. Moreover, the prediction process can also be affected by inter-personal influence, known as behavioral mutual infectivity. Inspired by point process in modeling temporal point process, in this paper we present a deep prediction method based on two recurrent neural networks (RNNs) to jointly model each user's continuous browsing history and asynchronous event sequences in the context of inter-user behavioral mutual infectivity. Our model is able to predict the fine-grained interest from a user regarding a particular item and corresponding timestamps when an occurrence of event takes place. The proposed approach is more flexible to capture the dynamic characteristic of event sequences by using the temporal point process to model event data and timely update its intensity function by RNNs. Furthermore, to improve the interpretability of the model, the attention mechanism is introduced to emphasize both intra-personal and inter-personal behavior influence over time. Experiments on real datasets demonstrate that our model outperforms the state-of-the-art methods in fine-grained user interest prediction.