Xiaofeng Xu

CL
h-index14
10papers
120citations
Novelty56%
AI Score35

10 Papers

CLAug 28, 2024
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Taiwei Shi, Zhuoer Wang, Longqi Yang et al. · amazon-science, microsoft-research

As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, misalignment with real-world user preferences, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with LLMs to create preference datasets automatically. Given a corpus of multi-turn user-LLM conversation, WildFeedback identifies and classifies user feedback to LLM responses between conversation turns. The user feedback is then used to create examples of preferred and dispreferred responses according to users' preference. Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences, as evidenced by both traditional benchmarks and our proposed checklist-guided evaluation. By incorporating in-situ feedback from actual users, WildFeedback addresses the scalability, subjectivity, and bias challenges that plague existing approaches, marking a significant step toward developing LLMs that are more responsive to the diverse and evolving needs of their users.

CLFeb 26, 2025
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation

Jie He, Jennifer Neville, Mengting Wan et al.

Large Language Models (LLMs) can enhance their capabilities as AI assistants by integrating external tools, allowing them to access a wider range of information. While recent LLMs are typically fine-tuned with tool usage examples during supervised fine-tuning (SFT), questions remain about their ability to develop robust tool-usage skills and can effectively generalize to unseen queries and tools. In this work, we present GenTool, a novel training framework that prepares LLMs for diverse generalization challenges in tool utilization. Our approach addresses two fundamental dimensions critical for real-world applications: Zero-to-One Generalization, enabling the model to address queries initially lacking a suitable tool by adopting and utilizing one when it becomes available, and Weak-to-Strong Generalization, allowing models to leverage enhanced versions of existing tools to solve queries. To achieve this, we develop synthetic training data simulating these two dimensions of tool usage and introduce a two-stage fine-tuning approach: optimizing tool ranking, then refining tool selection. Through extensive experiments across four generalization scenarios, we demonstrate that our method significantly enhances the tool-usage capabilities of LLMs ranging from 1B to 8B parameters, achieving performance that surpasses GPT-4o. Furthermore, our analysis also provides valuable insights into the challenges LLMs encounter in tool generalization.

ARNov 26, 2024
A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training

Mingjing Li, Huihui Zhou, Xiaofeng Xu et al.

There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in energy-constrained edges, but existing neuromorphic architectures lack the ability of directly training spiking neural networks (SNNs) based on backpropagation. We develop a multi-core neuromorphic architecture with Feedforward-Propagation, Back-Propagation, and Weight-Gradient engines in each core, supporting high efficient parallel computing at both the engine and core levels. It combines various data flows and sparse computation optimization by fully leveraging the sparsity in SNN training, obtaining a high energy efficiency of 1.05TFLOPS/W@ FP16 @ 28nm, 55 ~ 85% reduction of DRAM access compared to A100 GPU in SNN trainings, and a 20-core deep SNN training and a 5-worker federated learning on FPGAs. Our study develops the first multi-core neuromorphic architecture supporting the direct SNN training, facilitating the neuromorphic computing in edge-learnable applications.

CLMar 11, 2025
Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar et al.

LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA), that identifies context-specific variations in conversational preferences across user groups and then steers LLMs to address those preferences. Our approach consists of two steps: (1) Group-Aware Preference Extraction, where maximally divergent user-group preferences are extracted from real-world conversation logs and distilled into interpretable rubrics, and (2) Tailored Response Generation, which leverages these rubrics through two methods: a) Context-Tuned Inference (GAP-CT), that dynamically adjusts responses via context-dependent prompt instructions, and b) Rubric-Finetuning Inference (GPA-FT), which uses the rubrics to generate contrastive synthetic data for personalization of group-specific models via alignment. Experiments demonstrate that our framework significantly improves alignment of the output with respect to user preferences and outperforms baseline methods, while maintaining robust performance on standard benchmarks.

IRMar 19, 2024
Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Ying-Chun Lin, Jennifer Neville, Jack W. Stokes et al.

Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.

CRMay 4, 2020
PGLP: Customizable and Rigorous Location Privacy through Policy Graph

Yang Cao, Yonghui Xiao, Shun Takagi et al.

Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., \textit{Policy Graph based Location Privacy}, providing a rich interface to release private locations with customizable and rigorous privacy guarantee. First, we design the privacy metrics of PGLP by extending differential privacy. Specifically, we formalize a user's location privacy requirements using a \textit{location policy graph}, which is expressive and customizable. Second, we investigate how to satisfy an arbitrarily given location policy graph under adversarial knowledge. We find that a location policy graph may not always be viable and may suffer \textit{location exposure} when the attacker knows the user's mobility pattern. We propose efficient methods to detect location exposure and repair the policy graph with optimal utility. Third, we design a private location trace release framework that pipelines the detection of location exposure, policy graph repair, and private trajectory release with customizable and rigorous location privacy. Finally, we conduct experiments on real-world datasets to verify the effectiveness of the privacy-utility trade-off and the efficiency of the proposed algorithms.

CVJul 26, 2019
Improving Generalization via Attribute Selection on Out-of-the-box Data

Xiaofeng Xu, Ivor W. Tsang, Chuancai Liu

Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes), by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This paper first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which utilizes all the attributes. Unfortunately, previous attribute selection methods are conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo relevance feedback, this paper introduces the out-of-the-box data, which is pseudo data generated by an attribute-guided generative model, to mimic the unseen data. After that, we present an iterative attribute selection (IAS) strategy which iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.

CVMay 20, 2019
Learning Image-Specific Attributes by Hyperbolic Neighborhood Graph Propagation

Xiaofeng Xu, Ivor W. Tsang, Xiaofeng Cao et al.

As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.

LGSep 28, 2018
Target-Independent Active Learning via Distribution-Splitting

Xiaofeng Cao, Ivor W. Tsang, Xiaofeng Xu et al.

To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by introducing a novel notion of number density which provides a more natural and direct way to describe the hypothesis distribution than volume. By discovering the connections between hypothesis and input distribution, we map the volume of version space into the number density and propose a target-independent distribution-splitting strategy with the following advantages: 1) provide theoretical guarantees on reducing label complexity and error rate as volume-splitting; 2) break the curse of initial hypothesis; 3) provide model guidance for a target-independent AL algorithm in real AL tasks. With these guarantees, for AL application, we then split the input distribution into more near-optimal spheres and develop an application algorithm called Distribution-based A^2 (DA^2). Experiments further verify the effectiveness of the halving and querying abilities of DA^2. Contributions of this paper are as follows.

CVApr 17, 2018
Complementary Attributes: A New Clue to Zero-Shot Learning

Xiaofeng Xu, Ivor W. Tsang, Chuancai Liu

Zero-shot learning (ZSL) aims to recognize unseen objects using disjoint seen objects via sharing attributes. The generalization performance of ZSL is governed by the attributes, which transfer semantic information from seen classes to unseen classes. To take full advantage of the knowledge transferred by attributes, in this paper, we introduce the notion of complementary attributes (CA), as a supplement to the original attributes, to enhance the semantic representation ability. Theoretical analyses demonstrate that complementary attributes can improve the PAC-style generalization bound of original ZSL model. Since the proposed CA focuses on enhancing the semantic representation, CA can be easily applied to any existing attribute-based ZSL methods, including the label-embedding strategy based ZSL (LEZSL) and the probability-prediction strategy based ZSL (PPZSL). In PPZSL, there is a strong assumption that all the attributes are independent of each other, which is arguably unrealistic in practice. To solve this problem, a novel rank aggregation framework is proposed to circumvent the assumption. Extensive experiments on five ZSL benchmark datasets and the large-scale ImageNet dataset demonstrate that the proposed complementary attributes and rank aggregation can significantly and robustly improve existing ZSL methods and achieve the state-of-the-art performance.