CRMay 12, 2022
Privacy-Preserving Distributed Machine Learning Made FasterZoe L. Jiang, Jiajing Gu, Hongxiao Wang et al.
With the development of machine learning, it is difficult for a single server to process all the data. So machine learning tasks need to be spread across multiple servers, turning the centralized machine learning into a distributed one. However, privacy remains an unsolved problem in distributed machine learning. Multi-key homomorphic encryption is one of the suitable candidates to solve the problem. However, the most recent result of the Multi-key homomorphic encryption scheme (MKTFHE) only supports the NAND gate. Although it is Turing complete, it requires efficient encapsulation of the NAND gate to further support mathematical calculation. This paper designs and implements a series of operations on positive and negative integers accurately. First, we design basic bootstrapped gates with the same efficiency as that of the NAND gate. Second, we construct practical $k$-bit complement mathematical operators based on our basic binary bootstrapped gates. The constructed created can perform addition, subtraction, multiplication, and division on both positive and negative integers. Finally, we demonstrated the generality of the designed operators by achieving a distributed privacy-preserving machine learning algorithm, i.e. linear regression with two different solutions. Experiments show that the operators we designed are practical and efficient.
AIMay 5, 2022
Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video ServicesFan Zhang, Qiuying Peng, Yulin Wu et al.
Recently, industrial recommendation services have been boosted by the continual upgrade of deep learning methods. However, they still face de-biasing challenges such as exposure bias and cold-start problem, where circulations of machine learning training on human interaction history leads algorithms to repeatedly suggest exposed items while ignoring less-active ones. Additional problems exist in multi-scenario platforms, e.g. appropriate data fusion from subsidiary scenarios, which we observe could be alleviated through graph structured data integration via message passing. In this paper, we present a multi-graph structured multi-scenario recommendation solution, which encapsulates interaction data across scenarios with multi-graph and obtains representation via graph learning. Extensive offline and online experiments on real-world datasets are conducted where the proposed method demonstrates an increase of 0.63% and 0.71% in CTR and Video Views per capita on new users over deployed set of baselines and outperforms regular method in increasing the number of outer-scenario videos by 25% and video watches by 116%, validating its superiority in activating cold videos and enriching target recommendation.
CVFeb 3Code
Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token PruningDingkun Zhang, Shuhan Qi, Yulin Wu et al.
Multimodal Large Language Models (MLLMs) suffer from severe training inefficiency issue, which is associated with their massive model sizes and visual token numbers. Existing efforts in efficient training focus on reducing model sizes or trainable parameters. Inspired by the success of Visual Token Pruning (VTP) in improving inference efficiency, we are exploring another substantial research direction for efficient training by reducing visual tokens. However, applying VTP at the training stage results in a training-inference mismatch: pruning-trained models perform poorly when inferring on non-pruned full visual token sequences. To close this gap, we propose DualSpeed, a fast-slow framework for efficient training of MLLMs. The fast-mode is the primary mode, which incorporates existing VTP methods as plugins to reduce visual tokens, along with a mode isolator to isolate the model's behaviors. The slow-mode is the auxiliary mode, where the model is trained on full visual sequences to retain training-inference consistency. To boost its training, it further leverages self-distillation to learn from the sufficiently trained fast-mode. Together, DualSpeed can achieve both training efficiency and non-degraded performance. Experiments show DualSpeed accelerates the training of LLaVA-1.5 by 2.1$\times$ and LLaVA-NeXT by 4.0$\times$, retaining over 99% performance. Code: https://github.com/dingkun-zhang/DualSpeed
LGSep 20, 2024
RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender SystemsShuo Su, Xiaoshuang Chen, Yao Wang et al.
Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
LGJul 14, 2024
MKDTI: Predicting drug-target interactions via multiple kernel fusion on graph attention networkYuhuan Zhou, Yulin Wu, Weiwei Yuan et al.
Drug-target relationships may now be predicted computationally using bioinformatics data, which is a valuable tool for understanding pharmacological effects, enhancing drug development efficiency, and advancing related research. A number of structure-based, ligand-based and network-based approaches have now emerged. Furthermore, the integration of graph attention networks with intricate drug target studies is an application area of growing interest. In our work, we formulate a model called MKDTI by extracting kernel information from various layer embeddings of a graph attention network. This combination improves the prediction ability with respect to novel drug-target relationships. We first build a drug-target heterogeneous network using heterogeneous data of drugs and targets, and then use a self-enhanced multi-head graph attention network to extract potential features in each layer. Next, we utilize embeddings of each layer to computationally extract kernel matrices and fuse multiple kernel matrices. Finally, we use a Dual Laplacian Regularized Least Squares framework to forecast novel drug-target entity connections. This prediction can be facilitated by integrating the kernel matrix associated with the drug-target. We measured our model's efficacy using AUPR and AUC. Compared to the benchmark algorithms, our model outperforms them in the prediction outcomes. In addition, we conducted an experiment on kernel selection. The results show that the multi-kernel fusion approach combined with the kernel matrix generated by the graph attention network provides complementary insights into the model. The fusion of this information helps to enhance the accuracy of the predictions.
CVFeb 3, 2024Code
Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation NetworkYanhua Zhang, Ke Zhang, Jingyu Wang et al.
Real-time semantic segmentation is a crucial research for real-world applications. However, many methods lay particular emphasis on reducing the computational complexity and model size, while largely sacrificing the accuracy. To tackle this problem, we propose a parallel inference network customized for semantic segmentation tasks to achieve a good trade-off between speed and accuracy. We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy. Specifically, we first design a dual-pyramidal path architecture (Multi-level Feature Aggregation Module, MFAM) to aggregate multi-level features from the encoder to each scale, providing hierarchical clues for subsequent spatial alignment and corresponding in-network inference. Then, we build Recursive Alignment Module (RAM) by combining the flow-based alignment module with recursive upsampling architecture for accurate spatial alignment between multi-scale feature maps with half the computational complexity of the straightforward alignment method. Finally, we perform independent parallel inference on the aligned features to obtain multi-scale scores, and adaptively fuse them through an attention-based Adaptive Scores Fusion Module (ASFM) so that the final prediction can favor objects of multiple scales. Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets. We also conducted systematic ablation studies to gain insight into our motivation and architectural design. Code is available at: https://github.com/Yanhua-Zhang/MFARANet.
CLMar 12, 2024
LaERC-S: Improving LLM-based Emotion Recognition in Conversation with Speaker CharacteristicsYumeng Fu, Junjie Wu, Zhongjie Wang et al.
Emotion recognition in conversation (ERC), the task of discerning human emotions for each utterance within a conversation, has garnered significant attention in human-computer interaction systems. Previous ERC studies focus on speaker-specific information that predominantly stems from relationships among utterances, which lacks sufficient information around conversations. Recent research in ERC has sought to exploit pre-trained large language models (LLMs) with speaker modelling to comprehend emotional states. Although these methods have achieved encouraging results, the extracted speaker-specific information struggles to indicate emotional dynamics. In this paper, motivated by the fact that speaker characteristics play a crucial role and LLMs have rich world knowledge, we present LaERC-S, a novel framework that stimulates LLMs to explore speaker characteristics involving the mental state and behavior of interlocutors, for accurate emotion predictions. To endow LLMs with this knowledge information, we adopt the two-stage learning to make the models reason speaker characteristics and track the emotion of the speaker in complex conversation scenarios. Extensive experiments on three benchmark datasets demonstrate the superiority of LaERC-S, reaching the new state-of-the-art.
LGApr 23, 2024
Cache-Aware Reinforcement Learning in Large-Scale Recommender SystemsXiaoshuang Chen, Gengrui Zhang, Yao Wang et al.
Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise result cache is used to provide recommendations when the recommender system cannot afford a real-time computation. However, the cached recommendations are usually suboptimal compared to real-time computation, and it is challenging to determine the items in the cache for each user. In this paper, we provide a cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. We formulate the problem as a Markov decision process with user states and a cache state, where the cache state represents whether the recommender system performs recommendations by real-time computation or by the cache. The computational load of the recommender system determines the cache state. We perform reinforcement learning based on such a model to improve user engagement over multiple requests. Moreover, we show that the cache will introduce a challenge called critic dependency, which deteriorates the performance of reinforcement learning. To tackle this challenge, we propose an eigenfunction learning (EL) method to learn independent critics for CARL. Experiments show that CARL can significantly improve the users' engagement when considering the result cache. CARL has been fully launched in Kwai app, serving over 100 million users.
GTSep 17, 2025
Efficient Last-Iterate Convergence in Regret Minimization via Adaptive Reward TransformationHang Ren, Yulin Wu, Shuhan Qi et al.
Regret minimization is a powerful method for finding Nash equilibria in Normal-Form Games (NFGs) and Extensive-Form Games (EFGs), but it typically guarantees convergence only for the average strategy. However, computing the average strategy requires significant computational resources or introduces additional errors, limiting its practical applicability. The Reward Transformation (RT) framework was introduced to regret minimization to achieve last-iterate convergence through reward function regularization. However, it faces practical challenges: its performance is highly sensitive to manually tuned parameters, which often deviate from theoretical convergence conditions, leading to slow convergence, oscillations, or stagnation in local optima. Inspired by previous work, we propose an adaptive technique to address these issues, ensuring better consistency between theoretical guarantees and practical performance for RT Regret Matching (RTRM), RT Counterfactual Regret Minimization (RTCFR), and their variants in solving NFGs and EFGs more effectively. Our adaptive methods dynamically adjust parameters, balancing exploration and exploitation while improving regret accumulation, ultimately enhancing asymptotic last-iterate convergence and achieving linear convergence. Experimental results demonstrate that our methods significantly accelerate convergence, outperforming state-of-the-art algorithms.
CLMar 31, 2025
BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in ConversationYumeng Fu, Junjie Wu, Zhongjie Wang et al.
Multimodal emotion recognition in conversation (MERC), the task of identifying the emotion label for each utterance in a conversation, is vital for developing empathetic machines. Current MLLM-based MERC studies focus mainly on capturing the speaker's textual or vocal characteristics, but ignore the significance of video-derived behavior information. Different from text and audio inputs, learning videos with rich facial expression, body language and posture, provides emotion trigger signals to the models for more accurate emotion predictions. In this paper, we propose a novel behavior-aware MLLM-based framework (BeMERC) to incorporate speaker's behaviors, including subtle facial micro-expression, body language and posture, into a vanilla MLLM-based MERC model, thereby facilitating the modeling of emotional dynamics during a conversation. Furthermore, BeMERC adopts a two-stage instruction tuning strategy to extend the model to the conversations scenario for end-to-end training of a MERC predictor. Experiments demonstrate that BeMERC achieves superior performance than the state-of-the-art methods on two benchmark datasets, and also provides a detailed discussion on the significance of video-derived behavior information in MERC.
AIJun 21, 2024
KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement LearningJiahan Chen, Shuhan Qi, Yifan Li et al.
Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time.
NEJun 11, 2024
GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired FrameworkBoyang Li, Yulin Wu, Nuoxian Huang et al.
Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a Grid-cell inspired Positional Encoding technique, termed GridPE, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.
PRApr 20, 2021
Stock Market Trend Analysis Using Hidden Markov Model and Long Short Term MemoryMingwen Liu, Junbang Huo, Yulin Wu et al.
This paper intends to apply the Hidden Markov Model into stock market and and make predictions. Moreover, four different methods of improvement, which are GMM-HMM, XGB-HMM, GMM-HMM+LSTM and XGB-HMM+LSTM, will be discussed later with the results of experiment respectively. After that we will analyze the pros and cons of different models. And finally, one of the best will be used into stock market for timing strategy.
QUANT-PHOct 13, 2020
Experimental Quantum Generative Adversarial Networks for Image GenerationHe-Liang Huang, Yuxuan Du, Ming Gong et al.
Quantum machine learning is expected to be one of the first practical applications of near-term quantum devices. Pioneer theoretical works suggest that quantum generative adversarial networks (GANs) may exhibit a potential exponential advantage over classical GANs, thus attracting widespread attention. However, it remains elusive whether quantum GANs implemented on near-term quantum devices can actually solve real-world learning tasks. Here, we devise a flexible quantum GAN scheme to narrow this knowledge gap, which could accomplish image generation with arbitrarily high-dimensional features, and could also take advantage of quantum superposition to train multiple examples in parallel. For the first time, we experimentally achieve the learning and generation of real-world hand-written digit images on a superconducting quantum processor. Moreover, we utilize a gray-scale bar dataset to exhibit the competitive performance between quantum GANs and the classical GANs based on multilayer perceptron and convolutional neural network architectures, respectively, benchmarked by the Fréchet Distance score. Our work provides guidance for developing advanced quantum generative models on near-term quantum devices and opens up an avenue for exploring quantum advantages in various GAN-related learning tasks.
LGSep 10, 2020
RLCFR: Minimize Counterfactual Regret by Deep Reinforcement LearningHuale Li, Xuan Wang, Fengwei Jia et al.
Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various games have shown that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.