h-index44
226papers
5,569citations
Novelty55%
AI Score62

226 Papers

AIJun 5, 2023Code
InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Lichang Chen, Jiuhai Chen, Tom Goldstein et al.

Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.

CVMar 11, 2022
Towards Bi-directional Skip Connections in Encoder-Decoder Architectures and Beyond

Tiange Xiang, Chaoyi Zhang, Xinyi Wang et al. · cmu

U-Net, as an encoder-decoder architecture with forward skip connections, has achieved promising results in various medical image analysis tasks. Many recent approaches have also extended U-Net with more complex building blocks, which typically increase the number of network parameters considerably. Such complexity makes the inference stage highly inefficient for clinical applications. Towards an effective yet economic segmentation network design, in this work, we propose backward skip connections that bring decoded features back to the encoder. Our design can be jointly adopted with forward skip connections in any encoder-decoder architecture forming a recurrence structure without introducing extra parameters. With the backward skip connections, we propose a U-Net based network family, namely Bi-directional O-shape networks, which set new benchmarks on multiple public medical imaging segmentation datasets. On the other hand, with the most plain architecture (BiO-Net), network computations inevitably increase along with the pre-set recurrence time. We have thus studied the deficiency bottleneck of such recurrent design and propose a novel two-phase Neural Architecture Search (NAS) algorithm, namely BiX-NAS, to search for the best multi-scale bi-directional skip connections. The ineffective skip connections are then discarded to reduce computational costs and speed up network inference. The finally searched BiX-Net yields the least network complexity and outperforms other state-of-the-art counterparts by large margins. We evaluate our methods on both 2D and 3D segmentation tasks in a total of six datasets. Extensive ablation studies have also been conducted to provide a comprehensive analysis for our proposed methods.

CVSep 7, 2022Code
Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. Existing channel pruning algorithms for CNNs have achieved plenty of success on complex models. They approach the pruning problem from various perspectives and use different metrics to guide the pruning process. However, these metrics mainly focus on the model's `outputs' or `weights' and neglect its `interpretations' information. To fill in this gap, we propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process, thereby utilizing information from both inputs and outputs of the model. However, existing interpretation methods cannot get deployed to achieve our goal as either they are inefficient for pruning or may predict non-coherent explanations. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models. We parameterize the distribution of explanatory masks by Radial Basis Function (RBF)-like functions to incorporate geometric prior of natural images in our selector model's inductive bias. Thus, we can obtain compact representations of explanations to reduce the computational costs of our pruning method. We leverage our selector model to steer the network pruning by maximizing the similarity of explanatory representations for the pruned and original models. Extensive experiments on CIFAR-10 and ImageNet benchmark datasets demonstrate the efficacy of our proposed method. Our implementations are available at \url{https://github.com/Alii-Ganjj/InterpretationsSteeredPruning}

LGSep 22, 2023Code
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Kai Huang, Hanyun Yin, Heng Huang et al.

Fine-tuning is the most effective way of adapting pre-trained large language models (LLMs) to downstream applications. With the fast growth of LLM-enabled AI applications and democratization of open-souced LLMs, fine-tuning has become possible for non-expert individuals, but intensively performed LLM fine-tuning worldwide could result in significantly high energy consumption and carbon footprint, which may bring large environmental impact. Mitigating such environmental impact towards Green AI directly correlates to reducing the FLOPs of fine-tuning, but existing techniques on efficient LLM fine-tuning can only achieve limited reduction of such FLOPs, due to their ignorance of the backpropagation cost in fine-tuning. To address this limitation, in this paper we present GreenTrainer, a new LLM fine-tuning technique that adaptively evaluates different tensors' backpropagation costs and contributions to the fine-tuned model accuracy, to minimize the fine-tuning cost by selecting the most appropriate set of tensors in training. Such selection in GreenTrainer is made based on a given objective of FLOPs reduction, which can flexibly adapt to the carbon footprint in energy supply and the need in Green AI. Experiment results over multiple open-sourced LLM models and abstractive summarization datasets show that, compared to fine-tuning the whole LLM model, GreenTrainer can save up to 64% FLOPs in fine-tuning without any noticeable model accuracy loss. Compared to the existing fine-tuning techniques such as LoRa, GreenTrainer can achieve up to 4% improvement on model accuracy with on-par FLOPs reduction.

89.8AIMay 29
Capability Self-Assessment: Teaching LLMs to Know Their Limits

Haoyan Yang, Reza Shirkavand, Yukai Jin et al.

The ability to recognize one's own limitations and decide whether to solve a problem or delegate is fundamental for reliable intelligent systems. Yet we show that modern large language models systematically lack this ability: across diverse model families and scales, they overestimate their competence and attempt queries they cannot solve. We refer to this ability as Capability Self-Assessment (CSA) and formulate it as a policy-learning problem, aiming to improve self-assessment while preserving the model's original capabilities. Our results show that reinforcement learning teaches CSA effectively, significantly outperforming supervised fine-tuning while preserving original capabilities. In contrast, supervised fine-tuning severely degrades the capabilities the model is meant to assess. Moreover, learned self-assessment behavior generalizes well out of distribution, suggesting that CSA is a transferable model trait. Finally, CSA is practically useful: it improves local-cloud decision making at inference time and provides a signal for targeted data selection during training.

LGJul 16, 2023Code
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

Zhenyi Wang, Enneng Yang, Li Shen et al.

Forgetting refers to the loss or deterioration of previously acquired knowledge. While existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new task, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.

CVOct 9, 2023Code
Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

Junfeng Guo, Yiming Li, Lixu Wang et al.

The prosperity of deep neural networks (DNNs) is largely benefited from open-source datasets, based on which users can evaluate and improve their methods. In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets. We reveal that these methods are fundamentally harmful given that they could introduce malicious misclassification behaviors to watermarked DNNs by the adversaries. In this paper, we design DOV from another perspective by making watermarked models (trained on the protected dataset) correctly classify some `hard' samples that will be misclassified by the benign model. Our method is inspired by the generalization property of DNNs, where we find a \emph{hardly-generalized domain} for the original dataset (as its \emph{domain watermark}). It can be easily learned with the protected dataset containing modified samples. Specifically, we formulate the domain generation as a bi-level optimization and propose to optimize a set of visually-indistinguishable clean-label modified data with similar effects to domain-watermarked samples from the hardly-generalized domain to ensure watermark stealthiness. We also design a hypothesis-test-guided ownership verification via our domain watermark and provide the theoretical analyses of our method. Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods. The code for reproducing main experiments is available at \url{https://github.com/JunfengGo/Domain-Watermark}.

CVMar 18, 2022
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation

An Xu, Wenqi Li, Pengfei Guo et al.

Cross-silo federated learning (FL) has attracted much attention in medical imaging analysis with deep learning in recent years as it can resolve the critical issues of insufficient data, data privacy, and training efficiency. However, there can be a generalization gap between the model trained from FL and the one from centralized training. This important issue comes from the non-iid data distribution of the local data in the participating clients and is well-known as client drift. In this work, we propose a novel training framework FedSM to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time. We also propose a novel personalized FL objective formulation and a new method SoftPull to solve it in our proposed framework FedSM. We conduct rigorous theoretical analysis to guarantee its convergence for optimizing the non-convex smooth objective function. Real-world medical image segmentation experiments using deep FL validate the motivations and effectiveness of our proposed method.

LGJul 14, 2022
Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling Model

Haoteng Tang, Guixiang Ma, Lei Guo et al.

Recently brain networks have been widely adopted to study brain dynamics, brain development and brain diseases. Graph representation learning techniques on brain functional networks can facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. However, current graph learning techniques have several issues on brain network mining. Firstly, most current graph learning models are designed for unsigned graph, which hinders the analysis of many signed network data (e.g., brain functional networks). Meanwhile, the insufficiency of brain network data limits the model performance on clinical phenotypes predictions. Moreover, few of current graph learning model is interpretable, which may not be capable to provide biological insights for model outcomes. Here, we propose an interpretable hierarchical signed graph representation learning model to extract graph-level representations from brain functional networks, which can be used for different prediction tasks. In order to further improve the model performance, we also propose a new strategy to augment functional brain network data for contrastive learning. We evaluate this framework on different classification and regression tasks using the data from HCP and OASIS. Our results from extensive experiments demonstrate the superiority of the proposed model compared to several state-of-the-art techniques. Additionally, we use graph saliency maps, derived from these prediction tasks, to demonstrate detection and interpretation of phenotypic biomarkers.

QMJul 14, 2024Code
Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Jianan Fan, Dongnan Liu, Canran Li et al.

Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation to mitigate distributional discrepancy, those methods are subjected to modeling the morphological characteristics of each cell individually, disregarding the hierarchical latent structure and intrinsic contextual correspondences across the tumor micro-environment. In this work, we identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition and thereby propose to exploit the dependence over various biological structures for domain adaptive cellular recognition. We discover those high-level correspondences via unsupervised contextual modeling and use them as bridges to facilitate adaptation over diverse organs and stains. In addition, to further exploit the rich spatial contexts embedded amongst nuclear communities, we propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents. The proposed method is extensively evaluated on a broad spectrum of cross-domain settings under miscellaneous data distribution shifts and outperforms the state-of-the-art methods by a substantial margin. Code is available at https://github.com/camwew/CellularRecognition_DA_CC.

LGMay 3, 2022
Local Stochastic Bilevel Optimization with Momentum-Based Variance Reduction

Junyi Li, Feihu Huang, Heng Huang

Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms and has been applied to many machine learning tasks such as data cleaning, few-shot learning, and neural architecture search. However, little attention has been paid to solve the bilevel problems under distributed setting. Federated learning (FL) is an emerging paradigm which solves machine learning tasks over distributed-located data. FL problems are challenging to solve due to the heterogeneity and communication bottleneck. However, it is unclear how these challenges will affect the convergence of Bilevel Optimization algorithms. In this paper, we study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm and we show it requires $O(ε^{-2})$ number of iterations to reach an $ε$-stationary point. Then we propose FedBiOAcc to accelerate FedBiO with the momentum-based variance-reduction technique under the stochastic scenario. We show FedBiOAcc has complexity of $O(ε^{-1.5})$. Finally, we validate our proposed algorithms via the important Fair Federated Learning task. More specifically, we define a bilevel-based group fair FL objective. Our algorithms show superior performances compared to other baselines in numerical experiments.

CVFeb 11Code
PhyCritic: Multimodal Critic Models for Physical AI

Tianyi Xiong, Shihao Wang, Guilin Liu et al.

With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.

CVJul 27, 2023
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation

Jianan Fan, Dongnan Liu, Hang Chang et al.

The success of automated medical image analysis depends on large-scale and expert-annotated training sets. Unsupervised domain adaptation (UDA) has been raised as a promising approach to alleviate the burden of labeled data collection. However, they generally operate under the closed-set adaptation setting assuming an identical label set between the source and target domains, which is over-restrictive in clinical practice where new classes commonly exist across datasets due to taxonomic inconsistency. While several methods have been presented to tackle both domain shifts and incoherent label sets, none of them take into account the common characteristics of the two issues and consider the learning dynamics along network training. In this work, we propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective. It exploits the low-rank nature of gradient space and devises a dual-stream distillation algorithm to regularize the learning dynamics of insufficiently annotated domain and classes with the external guidance obtained from reliable sources. Our approach resolves the issue of inadequate navigation along network optimization, which is the major obstacle in the taxonomy adaptive cross-domain adaptation scenario. We evaluate the proposed method extensively on several tasks towards various endpoints with clinical and open-world significance. The results demonstrate its effectiveness and improvements over previous methods.

LGDec 2, 2022
Faster Adaptive Federated Learning

Xidong Wu, Feihu Huang, Zhengmian Hu et al.

Federated learning has attracted increasing attention with the emergence of distributed data. While extensive federated learning algorithms have been proposed for the non-convex distributed problem, federated learning in practice still faces numerous challenges, such as the large training iterations to converge since the sizes of models and datasets keep increasing, and the lack of adaptivity by SGD-based model updates. Meanwhile, the study of adaptive methods in federated learning is scarce and existing works either lack a complete theoretical convergence guarantee or have slow sample complexity. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on the momentum-based variance-reduced technique in cross-silo FL. We first explore how to design the adaptive algorithm in the FL setting. By providing a counter-example, we prove that a simple combination of FL and adaptive methods could lead to divergence. More importantly, we provide a convergence analysis for our method and prove that our algorithm is the first adaptive FL algorithm to reach the best-known samples $O(ε^{-3})$ and $O(ε^{-2})$ communication rounds to find an $ε$-stationary point without large batches. The experimental results on the language modeling task and image classification task with heterogeneous data demonstrate the efficiency of our algorithms.

LGMar 19, 2022
Desirable Companion for Vertical Federated Learning: New Zeroth-Order Gradient Based Algorithm

Qingsong Zhang, Bin Gu, Zhiyuan Dang et al.

Vertical federated learning (VFL) attracts increasing attention due to the emerging demands of multi-party collaborative modeling and concerns of privacy leakage. A complete list of metrics to evaluate VFL algorithms should include model applicability, privacy security, communication cost, and computation efficiency, where privacy security is especially important to VFL. However, to the best of our knowledge, there does not exist a VFL algorithm satisfying all these criteria very well. To address this challenging problem, in this paper, we reveal that zeroth-order optimization (ZOO) is a desirable companion for VFL. Specifically, ZOO can 1) improve the model applicability of VFL framework, 2) prevent VFL framework from privacy leakage under curious, colluding, and malicious threat models, 3) support inexpensive communication and efficient computation. Based on that, we propose a novel and practical VFL framework with black-box models, which is inseparably interconnected to the promising properties of ZOO. We believe that it takes one stride towards designing a practical VFL framework matching all the criteria. Under this framework, we raise two novel {\bf asy}nchronous ze{\bf r}oth-ord{\bf e}r algorithms for {\bf v}ertical f{\bf e}derated {\bf l}earning (AsyREVEL) with different smoothing techniques. We theoretically drive the convergence rates of AsyREVEL algorithms under nonconvex condition. More importantly, we prove the privacy security of our proposed framework under existing VFL attacks on different levels. Extensive experiments on benchmark datasets demonstrate the favorable model applicability, satisfied privacy security, inexpensive communication, efficient computation, scalability and losslessness of our framework.

LGAug 11, 2022
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Runxue Bao, Bin Gu, Heng Huang

Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method.

CVNov 19, 2022
Towards Robust Dataset Learning

Yihan Wu, Xinda Li, Florian Kerschbaum et al.

Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often slow. In this paper, we study the problem of learning a robust dataset such that any classifier naturally trained on the dataset is adversarially robust. Such a dataset benefits the downstream tasks as natural training is much faster than adversarial training, and demonstrates that the desired property of robustness is transferable between models and data. In this work, we propose a principled, tri-level optimization to formulate the robust dataset learning problem. We show that, under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset. Extensive experiments on MNIST, CIFAR10, and TinyImageNet demostrate the effectiveness of our algorithm with different network initializations and architectures.

IRJun 17, 2022
RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval

Yihan Wu, Hongyang Zhang, Heng Huang

Recent research works have shown that image retrieval models are vulnerable to adversarial attacks, where slightly modified test inputs could lead to problematic retrieval results. In this paper, we aim to design a provably robust image retrieval model which keeps the most important evaluation metric Recall@1 invariant to adversarial perturbation. We propose the first 1-nearest neighbor (NN) image retrieval algorithm, RetrievalGuard, which is provably robust against adversarial perturbations within an $\ell_2$ ball of calculable radius. The challenge is to design a provably robust algorithm that takes into consideration the 1-NN search and the high-dimensional nature of the embedding space. Algorithmically, given a base retrieval model and a query sample, we build a smoothed retrieval model by carefully analyzing the 1-NN search procedure in the high-dimensional embedding space. We show that the smoothed retrieval model has bounded Lipschitz constant and thus the retrieval score is invariant to $\ell_2$ adversarial perturbations. Experiments on image retrieval tasks validate the robustness of our RetrievalGuard method.

LGDec 9, 2022
Adversarial Weight Perturbation Improves Generalization in Graph Neural Networks

Yihan Wu, Aleksandar Bojchevski, Heng Huang

A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is an emerging technique to efficiently and effectively find such minima. In AWP we minimize the loss w.r.t. a bounded worst-case perturbation of the model parameters thereby favoring local minima with a small loss in a neighborhood around them. The benefits of AWP, and more generally the connections between flatness and generalization, have been extensively studied for i.i.d. data such as images. In this paper, we extensively study this phenomenon for graph data. Along the way, we first derive a generalization bound for non-i.i.d. node classification tasks. Then we identify a vanishing-gradient issue with all existing formulations of AWP and we propose a new Weighted Truncated AWP (WT-AWP) to alleviate this issue. We show that regularizing graph neural networks with WT-AWP consistently improves both natural and robust generalization across many different graph learning tasks and models.

AIJun 27, 2023
Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets

Yimu Wang, Dinghuai Zhang, Yihan Wu et al. · mila

Despite incredible advances, deep learning has been shown to be susceptible to adversarial attacks. Numerous approaches have been proposed to train robust networks both empirically and certifiably. However, most of them defend against only a single type of attack, while recent work takes steps forward in defending against multiple attacks. In this paper, to understand multi-target robustness, we view this problem as a bargaining game in which different players (adversaries) negotiate to reach an agreement on a joint direction of parameter updating. We identify a phenomenon named player domination in the bargaining game, namely that the existing max-based approaches, such as MAX and MSD, do not converge. Based on our theoretical analysis, we design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance. Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.

AIApr 6, 2023
When do you need Chain-of-Thought Prompting for ChatGPT?

Jiuhai Chen, Lichang Chen, Heng Huang et al.

Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models~(LLMs). For example, by simply adding CoT instruction ``Let's think step-by-step'' to each input query of MultiArith dataset, GPT-3's accuracy can be improved from 17.7\% to 78.7\%. However, it is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT. Surprisingly, on ChatGPT, CoT is no longer effective for certain tasks such as arithmetic reasoning while still keeping effective on other reasoning tasks. Moreover, on the former tasks, ChatGPT usually achieves the best performance and can generate CoT even without being instructed to do so. Hence, it is plausible that ChatGPT has already been trained on these tasks with CoT and thus memorized the instruction so it implicitly follows such an instruction when applied to the same queries, even without CoT. Our analysis reflects a potential risk of overfitting/bias toward instructions introduced in IFT, which becomes more common in training LLMs. In addition, it indicates possible leakage of the pretraining recipe, e.g., one can verify whether a dataset and instruction were used in training ChatGPT. Our experiments report new baseline results of ChatGPT on a variety of reasoning tasks and shed novel insights into LLM's profiling, instruction memorization, and pretraining dataset leakage.

CVOct 27, 2022
BI AVAN: Brain inspired Adversarial Visual Attention Network

Heng Huang, Lin Zhao, Xintao Hu et al.

Visual attention is a fundamental mechanism in the human brain, and it inspires the design of attention mechanisms in deep neural networks. However, most of the visual attention studies adopted eye-tracking data rather than the direct measurement of brain activity to characterize human visual attention. In addition, the adversarial relationship between the attention-related objects and attention-neglected background in the human visual system was not fully exploited. To bridge these gaps, we propose a novel brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our BI-AVAN model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner. We use independent eye-tracking data as ground truth for validation and experimental results show that our model achieves robust and promising results when inferring meaningful human visual attention and mapping the relationship between brain activities and visual stimuli. Our BI-AVAN model contributes to the emerging field of leveraging the brain's functional architecture to inspire and guide the model design in artificial intelligence (AI), e.g., deep neural networks.

CLSep 18, 2024
From Lists to Emojis: How Format Bias Affects Model Alignment

Xuanchang Zhang, Wei Xiong, Lichang Chen et al.

In this paper, we study format biases in reinforcement learning from human feedback (RLHF). We observe that many widely-used preference models, including human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark, exhibit strong biases towards specific format patterns, such as lists, links, bold text, and emojis. Furthermore, large language models (LLMs) can exploit these biases to achieve higher rankings on popular benchmarks like AlpacaEval and LMSYS Chatbot Arena. One notable example of this is verbosity bias, where current preference models favor longer responses that appear more comprehensive, even when their quality is equal to or lower than shorter, competing responses. However, format biases beyond verbosity remain largely underexplored in the literature. In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases. Additionally, we show that with a small amount of biased data (less than 1%), we can inject significant bias into the reward model. Moreover, these format biases can also be easily exploited by downstream alignment algorithms, such as best-of-n sampling and online iterative DPO, as it is usually easier to manipulate the format than to improve the quality of responses. Our findings emphasize the need to disentangle format and content both for designing alignment algorithms and evaluating models.

LGMar 4
CRESTomics: Analyzing Carotid Plaques in the CREST-2 Trial with a New Additive Classification Model

Pranav Kulkarni, Brajesh K. Lal, Georges Jreij et al.

Accurate characterization of carotid plaques is critical for stroke prevention in patients with carotid stenosis. We analyze 500 plaques from CREST-2, a multi-center clinical trial, to identify radiomics-based markers from B-mode ultrasound images linked with high-risk. We propose a new kernel-based additive model, combining coherence loss with group-sparse regularization for nonlinear classification. Group-wise additive effects of each feature group are visualized using partial dependence plots. Results indicate our method accurately and interpretably assesses plaques, revealing a strong association between plaque texture and clinical risk.

LGJul 8, 2022
Balanced Self-Paced Learning for AUC Maximization

Bin Gu, Chenkang Zhang, Huan Xiong et al.

Learning to improve AUC performance is an important topic in machine learning. However, AUC maximization algorithms may decrease generalization performance due to the noisy data. Self-paced learning is an effective method for handling noisy data. However, existing self-paced learning methods are limited to pointwise learning, while AUC maximization is a pairwise learning problem. To solve this challenging problem, we innovatively propose a balanced self-paced AUC maximization algorithm (BSPAUC). Specifically, we first provide a statistical objective for self-paced AUC. Based on this, we propose our self-paced AUC maximization formulation, where a novel balanced self-paced regularization term is embedded to ensure that the selected positive and negative samples have proper proportions. Specially, the sub-problem with respect to all weight variables may be non-convex in our formulation, while the one is normally convex in existing self-paced problems. To address this, we propose a doubly cyclic block coordinate descent method. More importantly, we prove that the sub-problem with respect to all weight variables converges to a stationary point on the basis of closed-form solutions, and our BSPAUC converges to a stationary point of our fixed optimization objective under a mild assumption. Considering both the deep learning and kernel-based implementations, experimental results on several large-scale datasets demonstrate that our BSPAUC has a better generalization performance than existing state-of-the-art AUC maximization methods.

CLOct 18, 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Ming Li, Lichang Chen, Jiuhai Chen et al.

Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as highlighted in several studies, low-quality data in the training set are usually detrimental to instruction tuning, resulting in inconsistent or even misleading LLM outputs. We propose a novel method, termed "reflection-tuning," which addresses the problem by self-improvement and judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data. Extensive experiments on widely used evaluation benchmarks show that LLMs trained with our recycled data outperform those trained with existing datasets in various benchmarks.

CVApr 12, 2023
Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

Xiangyu Xu, Lichang Chen, Changjiang Cai et al.

Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

LGFeb 13, 2023
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

Junyi Li, Feihu Huang, Heng Huang

Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms. However, its application in the Federated Learning setting remains relatively underexplored, and the impact of Federated Learning's inherent challenges on the convergence of bilevel algorithms remain obscure. In this work, we investigate Federated Bilevel Optimization problems and propose a communication-efficient algorithm, named FedBiOAcc. The algorithm leverages an efficient estimation of the hyper-gradient in the distributed setting and utilizes the momentum-based variance-reduction acceleration. Remarkably, FedBiOAcc achieves a communication complexity $O(ε^{-1})$, a sample complexity $O(ε^{-1.5})$ and the linear speed up with respect to the number of clients. We also analyze a special case of the Federated Bilevel Optimization problems, where lower level problems are locally managed by clients. We prove that FedBiOAcc-Local, a modified version of FedBiOAcc, converges at the same rate for this type of problems. Finally, we validate the proposed algorithms through two real-world tasks: Federated Data-cleaning and Federated Hyper-representation Learning. Empirical results show superior performance of our algorithms.

CROct 11, 2023
A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

Yihan Wu, Zhengmian Hu, Junfeng Guo et al.

Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a \textbf{Di}stribution-\textbf{P}reserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.

LGFeb 13, 2023
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging

Junyi Li, Feihu Huang, Heng Huang

Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. In Federated Learning, a set of clients jointly perform a machine learning task under the coordination of a server. The FedAvg algorithm is one of the most widely used methods to solve Federated Learning problems. In FedAvg, the learning rate is a constant rather than changing adaptively. The adaptive gradient methods show superior performance over the constant learning rate schedule; however, there is still no general framework to incorporate adaptive gradient methods into the federated setting. In this paper, we propose \textbf{FedDA}, a novel framework for local adaptive gradient methods. The framework adopts a restarted dual averaging technique and is flexible with various gradient estimation methods and adaptive learning rate formulations. In particular, we analyze \textbf{FedDA-MVR}, an instantiation of our framework, and show that it achieves gradient complexity $\tilde{O}(ε^{-1.5})$ and communication complexity $\tilde{O}(ε^{-1})$ for finding a stationary point $ε$. This matches the best known rate for first-order FL algorithms and \textbf{FedDA-MVR} is the first adaptive FL algorithm that achieves this rate. We also perform extensive numerical experiments to verify the efficacy of our method.

CVNov 26, 2025Code
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

Tianyi Xiong, Yi Ge, Ming Li et al.

Large multimodal models (LMMs) are increasingly adopted as judges in multimodal evaluation systems due to their strong instruction following and consistency with human preferences. However, their ability to follow diverse, fine-grained evaluation criteria remains underexplored. We develop Multi-Crit, a benchmark for evaluating multimodal judges on their capacity to follow pluralistic criteria and produce reliable criterion-level judgments. Covering both open-ended generation and verifiable reasoning tasks, Multi-Crit is built through a rigorous data curation pipeline that gathers challenging response pairs with multi-criterion human annotations. It further introduces three novel metrics for systematically assessing pluralistic adherence, criterion-switching flexibility, and the ability to recognize criterion-level preference conflicts. Comprehensive analysis of 25 LMMs reveals that 1) proprietary models still struggle to maintain consistent adherence to pluralistic criteria--especially in open-ended evaluation; 2) open-source models lag further behind in flexibly following diverse criteria; and 3) critic fine-tuning with holistic judgment signals enhances visual grounding but fails to generalize to pluralistic criterion-level judgment. Additional analyses on reasoning fine-tuning, test-time scaling, and boundary consistency between open-source and proprietary models further probe the limits of current multimodal judges. As a pioneering study, Multi-Crit lays the foundation for building reliable and steerable multimodal AI evaluation.

LGMay 6, 2022
Functional2Structural: Cross-Modality Brain Networks Representation Learning

Haoteng Tang, Xiyao Fu, Lei Guo et al.

MRI-based modeling of brain networks has been widely used to understand functional and structural interactions and connections among brain regions, and factors that affect them, such as brain development and disease. Graph mining on brain networks may facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. Since brain networks derived from functional and structural MRI describe the brain topology from different perspectives, exploring a representation that combines these cross-modality brain networks is non-trivial. Most current studies aim to extract a fused representation of the two types of brain network by projecting the structural network to the functional counterpart. Since the functional network is dynamic and the structural network is static, mapping a static object to a dynamic object is suboptimal. However, mapping in the opposite direction is not feasible due to the non-negativity requirement of current graph learning techniques. Here, we propose a novel graph learning framework, known as Deep Signed Brain Networks (DSBN), with a signed graph encoder that, from an opposite perspective, learns the cross-modality representations by projecting the functional network to the structural counterpart. We validate our framework on clinical phenotype and neurodegenerative disease prediction tasks using two independent, publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods.

OCJan 10, 2023
A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

Chuan He, Heng Huang, Zhaosong Lu

In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this problem. Under some mild assumptions, we show that our method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of general nonconvex conic optimization with high probability. Moreover, under a constraint qualification, these complexity bounds are improved to $\widetilde{\cal O}(ε^{-7/2})$ and $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$, respectively. To the best of our knowledge, this is the first study on the complexity of finding an approximate SOSP of general nonconvex conic optimization. Preliminary numerical results are presented to demonstrate superiority of the proposed method over first-order methods in terms of solution quality.

CLNov 20, 2023
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

Zhengmian Hu, Gang Wu, Saayan Mitra et al.

In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is possible to generate adversarial prompts that bypass moderation and alignment of the models. This vulnerability to adversarial prompts underscores a significant concern regarding the robustness and reliability of LLMs. Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability. We measure the degree of the model's perplexity, where tokens predicted with high probability are considered normal, and those exhibiting high perplexity are flagged as adversarial. Additionaly, our method also integrates context understanding by incorporating neighboring token information to encourage the detection of contiguous adversarial prompt sequences. To this end, we design two algorithms for adversarial prompt detection: one based on optimization techniques and another on Probabilistic Graphical Models (PGM). Both methods are equipped with efficient solving methods, ensuring efficient adversarial prompt detection. Our token-level detection result can be visualized as heatmap overlays on the text sequence, allowing for a clearer and more intuitive representation of which part of the text may contain adversarial prompts.

LGOct 14, 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining

Wenhan Xian, Feihu Huang, Heng Huang

Distributed data mining is an emerging research topic to effectively and efficiently address hard data mining tasks using big data, which are partitioned and computed on different worker nodes, instead of one centralized server. Nevertheless, distributed learning methods often suffer from the communication bottleneck when the network bandwidth is limited or the size of model is large. To solve this critical issue, many gradient compression methods have been proposed recently to reduce the communication cost for multiple optimization algorithms. However, the current applications of gradient compression to adaptive gradient method, which is widely adopted because of its excellent performance to train DNNs, do not achieve the same ideal compression rate or convergence rate as Sketched-SGD. To address this limitation, in this paper, we propose a class of novel distributed Adam-type algorithms (\emph{i.e.}, SketchedAMSGrad) utilizing sketching, which is a promising compression technique that reduces the communication cost from $O(d)$ to $O(\log(d))$ where $d$ is the parameter dimension. In our theoretical analysis, we prove that our new algorithm achieves a fast convergence rate of $O(\frac{1}{\sqrt{nT}} + \frac{1}{(k/d)^2 T})$ with the communication cost of $O(k \log(d))$ at each iteration. Compared with single-machine AMSGrad, our algorithm can achieve the linear speedup with respect to the number of workers $n$. The experimental results on training various DNNs in distributed paradigm validate the efficiency of our algorithms.

LGApr 23, 2022
Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

Runxue Bao, Xidong Wu, Wenhan Xian et al.

Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Nevertheless, existing safe screening methods are limited to the sequential setting. In this paper, we propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively, which can achieve significant speedup without any loss of accuracy by simultaneously enjoying the sparsity of the model and dataset. To the best of our knowledge, this is the first work of distributed safe dynamic screening method. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method.

LGOct 5, 2023
Solving a Class of Non-Convex Minimax Optimization in Federated Learning

Xidong Wu, Jianhui Sun, Zhengmian Hu et al.

The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(κ^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(κ^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.

66.4CVMay 12Code
RNA-FM: Flow-Matching Generative Model for Genome-wide RNA-Seq Prediction

Yaxuan Song, Jianan Fan, Tianyi Wang et al.

Histopathology whole-slide images (WSIs) are routinely acquired in clinical practice and contain rich tissue morphology but lack direct molecular architecture and functional programs defining pathological states, whereas RNA sequencing (RNA-seq) provides genome-wide transcriptional profiles at substantial cost, thereby motivating WSI-based genome-wide transcriptomic prediction. Existing approaches for predicting gene expression from WSIs predominantly rely on deterministic regression with one-to-one mapping, limiting their ability to capture biological heterogeneity and predictive uncertainty. We propose RNA-FM, a flow-matching generative framework for genome-wide bulk RNA-seq prediction from WSIs. RNA-FM formulates transcriptomic prediction as a continuous-time conditional transport problem, learning a velocity field that maps a simple prior to the target gene expression distribution conditioned on morphologies. By integrating pathway-level structure, RNA-FM enables scalable and biologically interpretable genome-wide gene expression imputation. Extensive experiments demonstrate that RNA-FM consistently outperforms state-of-the-art approaches while maintaining biological meaningfulness. Code is available at https://github.com/YXSong000/RNA-FM.

LGJun 11, 2022
Communication-Efficient Robust Federated Learning with Noisy Labels

Junyi Li, Jian Pei, Heng Huang

Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed located data. In FL, the data is kept locally by each user. This protects the user privacy, but also makes the server difficult to verify data quality, especially if the data are correctly labeled. Training with corrupted labels is harmful to the federated learning task; however, little attention has been paid to FL in the case of label noise. In this paper, we focus on this problem and propose a learning-based reweighting approach to mitigate the effect of noisy labels in FL. More precisely, we tuned a weight for each training sample such that the learned model has optimal generalization performance over a validation set. More formally, the process can be formulated as a Federated Bilevel Optimization problem. Bilevel optimization problem is a type of optimization problem with two levels of entangled problems. The non-distributed bilevel problems have witnessed notable progress recently with new efficient algorithms. However, solving bilevel optimization problems under the Federated Learning setting is under-investigated. We identify that the high communication cost in hypergradient evaluation is the major bottleneck. So we propose \textit{Comm-FedBiO} to solve the general Federated Bilevel Optimization problems; more specifically, we propose two communication-efficient subroutines to estimate the hypergradient. Convergence analysis of the proposed algorithms is also provided. Finally, we apply the proposed algorithms to solve the noisy label problem. Our approach has shown superior performance on several real-world datasets compared to various baselines.

81.4LGMay 25
Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

Georgios Milis, Yubin Qin, Yihan Wu et al.

As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts. Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies. Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage. In this work, motivated by the vocabulary redundancy of discretization, we propose an elegant solution for powerful and robust watermarking of synthetic audio. We theoretically analyze the impact of token errors on watermark detection, and effectively mitigate them using a reduced vocabulary obtained via community detection. Thorough experiments showcase that our gradient-free method can boost detectability by several orders of magnitude, while also achieving built-in robustness to audio modifications. Broadly, we discover a new state-of-the-art for token-level watermarks in multimedia, which simply arises from the nature of discrete representation learning.

LGFeb 8, 2023
Decentralized Riemannian Algorithm for Nonconvex Minimax Problems

Xidong Wu, Zhengmian Hu, Heng Huang

The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has been actively applied to solve many problems, such as robust dimensionality reduction and deep neural networks with orthogonal weights (Stiefel manifold). Although many optimization algorithms for minimax problems have been developed in the Euclidean setting, it is difficult to convert them into Riemannian cases, and algorithms for nonconvex minimax problems with nonconvex constraints are even rare. On the other hand, to address the big data challenges, decentralized (serverless) training techniques have recently been emerging since they can reduce communications overhead and avoid the bottleneck problem on the server node. Nonetheless, the algorithm for decentralized Riemannian minimax problems has not been studied. In this paper, we study the distributed nonconvex-strongly-concave minimax optimization problem over the Stiefel manifold and propose both deterministic and stochastic minimax methods. The Steifel manifold is a non-convex set. The global function is represented as the finite sum of local functions. For the deterministic setting, we propose DRGDA and prove that our deterministic method achieves a gradient complexity of $O( ε^{-2})$ under mild conditions. For the stochastic setting, we propose DRSGDA and prove that our stochastic method achieves a gradient complexity of $O(ε^{-4})$. The DRGDA and DRSGDA are the first algorithms for distributed minimax optimization with nonconvex constraints with exact convergence. Extensive experimental results on the Deep Neural Networks (DNNs) training over the Stiefel manifold demonstrate the efficiency of our algorithms.

LGAug 6, 2023
Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

Xidong Wu, Zhengmian Hu, Jian Pei et al.

Multi-party collaborative training, such as distributed learning and federated learning, is used to address the big data challenges. However, traditional multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (\emph{e.g.}, cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although single-machine AUPRC maximization methods have been designed, multi-party collaborative algorithm has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. To address the above challenge, we study the serverless multi-party collaborative AUPRC maximization problem since serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck, and reformulate it as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem.

54.7LGMay 7
Distributionally Robust Multi-Objective Optimization

Yufeng Yang, Fangning Zhuo, Ziyi Chen et al.

Multi-objective optimization (MOO) has received growing attention in applications that require learning under multiple criteria. However, the existing MOO formulations do not explicitly account for distributional shifts in the data. We introduce distributionally robust multi-objective optimization (DR-MOO), which minimizes multiple objectives under their respective worst-case distributions. We propose Pareto-type solution concepts for DR-MOO and develop multi-gradient descent algorithms (MGDA) with provable guarantees. Leveraging a Lagrangian dual reformulation, we first design a double-loop MGDA that uses an inner loop to estimate dual variables and achieves a total sample complexity $\mathcal{O}(ε^{-12})$ for reaching an $ε$-Pareto-stationary point. To further improve efficiency, we incorporate gradient clipping to handle generalized-smooth and biased gradient estimates, removing the need for double sampling. This yields a single-loop double-clip MGDA with substantially improved sample complexity $\mathcal{O}(ε^{-4})$. Our theory applies to the nonconvex setting and does not require bounded objectives or gradients. Experiments demonstrate that our methods are competitive with state-of-the-art MGDA baselines.

96.2CLMay 8Code
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang et al.

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.

87.8LGMay 7Code
Rollback-Free Stable Brick Structures Generation

Chenhui Xu, Ziyue Bai, Fuxun Yu et al.

While autoregressive models have advanced 3D generation, creating physically stable brick structures remains a challenge due to the strict requirements of gravity and interconnectivity. Existing approaches rely on external physical simulators during inference to perform rejection sampling and brick-by-brick rollbacks, which severely bottlenecks efficiency. To address this, we propose a reinforcement learning paradigm that shifts physical validity enforcement from test-time correction to training-time policy optimization. By utilizing assembly-level rewards, the model optimizes for collision avoidance, global connectivity, structural interlocking, and shape conformity. This paradigm allows the model to internalize physical priors, enabling the first rollback-free generation of stable brick structures. Experimental results demonstrate that our approach achieves state-of-the-art generation quality while accelerating inference speed by orders of magnitude. Our code and dataset are available at https://github.com/miniHuiHui/STABLE. Our models are available at https://huggingface.co/miniHui/STABLE.

LGOct 25, 2022
FedGRec: Federated Graph Recommender System with Lazy Update of Latent Embeddings

Junyi Li, Heng Huang

Recommender systems are widely used in industry to improve user experience. Despite great success, they have recently been criticized for collecting private user data. Federated Learning (FL) is a new paradigm for learning on distributed data without direct data sharing. Therefore, Federated Recommender (FedRec) systems are proposed to mitigate privacy concerns to non-distributed recommender systems. However, FedRec systems have a performance gap to its non-distributed counterpart. The main reason is that local clients have an incomplete user-item interaction graph, thus FedRec systems cannot utilize indirect user-item interactions well. In this paper, we propose the Federated Graph Recommender System (FedGRec) to mitigate this gap. Our FedGRec system can effectively exploit the indirect user-item interactions. More precisely, in our system, users and the server explicitly store latent embeddings for users and items, where the latent embeddings summarize different orders of indirect user-item interactions and are used as a proxy of missing interaction graph during local training. We perform extensive empirical evaluations to verify the efficacy of using latent embeddings as a proxy of missing interaction graph; the experimental results show superior performance of our system compared to various baselines. A short version of the paper is presented in \href{https://federated-learning.org/fl-neurips-2022/}{the FL-NeurIPS'22 workshop}.

LGOct 4, 2023
Federated Conditional Stochastic Optimization

Xidong Wu, Jianhui Sun, Zhengmian Hu et al.

Conditional stochastic optimization has found applications in a wide range of machine learning tasks, such as invariant learning, AUPRC maximization, and meta-learning. As the demand for training models with large-scale distributed data grows in these applications, there is an increasing need for communication-efficient distributed optimization algorithms, such as federated learning algorithms. This paper considers the nonconvex conditional stochastic optimization in federated learning and proposes the first federated conditional stochastic optimization algorithm (FCSG) with a conditional stochastic gradient estimator and a momentum-based algorithm (FCSG-M). To match the lower bound complexity in the single-machine setting, we design an accelerated algorithm (Acc-FCSG-M) via the variance reduction to achieve the best sample and communication complexity. Compared with the existing optimization analysis for MAML in FL, federated conditional stochastic optimization considers the sample of tasks. Extensive experimental results on various tasks validate the efficiency of these algorithms.

CVSep 23, 2024
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection

Alireza Ganjdanesh, Yan Kang, Yuchen Liu et al.

Diffusion probabilistic models can generate high-quality samples. Yet, their sampling process requires numerous denoising steps, making it slow and computationally intensive. We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts. First, we study the similarities between pairs of denoising timesteps, observing a natural clustering, even across different datasets. This suggests that rather than having a single model for all time steps, separate models can serve as ``experts'' for their respective time intervals. As such, we separately fine-tune the pretrained model on each interval, with elastic dimensions in depth and width, to obtain experts specialized in their corresponding denoising interval. To optimize the resource usage between experts, we introduce our Expert Routing Agent, which learns to select a set of proper network configurations. By doing so, our method can allocate the computing budget between the experts in an end-to-end manner without requiring manual heuristics. Finally, with a selected configuration, we fine-tune our pruned experts to obtain our mixture of efficient experts. We demonstrate the effectiveness of our method, DiffPruning, across several datasets, LSUN-Church, LSUN-Beds, FFHQ, and ImageNet, on the Latent Diffusion Model architecture.

LGMar 20, 2024Code
A Unified and General Framework for Continual Learning

Zhenyi Wang, Yan Li, Li Shen et al.

Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their approaches. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies. Notably, this new framework is capable of encompassing established CL approaches as special instances within a unified and general optimization objective. An intriguing finding is that despite their diverse origins, these methods share common mathematical structures. This observation highlights the compatibility of these seemingly distinct techniques, revealing their interconnectedness through a shared underlying optimization objective. Moreover, the proposed general framework introduces an innovative concept called refresh learning, specifically designed to enhance the CL performance. This novel approach draws inspiration from neuroscience, where the human brain often sheds outdated information to improve the retention of crucial knowledge and facilitate the acquisition of new information. In essence, refresh learning operates by initially unlearning current data and subsequently relearning it. It serves as a versatile plug-in that seamlessly integrates with existing CL methods, offering an adaptable and effective enhancement to the learning process. Extensive experiments on CL benchmarks and theoretical analysis demonstrate the effectiveness of the proposed refresh learning. Code is available at \url{https://github.com/joey-wang123/CL-refresh-learning}.