28.7NAMay 12
An analysis on stochastic Lanczos quadrature with asymmetric quadrature nodesWenhao Li, Yixuan Huang, Shengxin Zhu
This paper revisits the error analysis of the Stochastic Lanczos Quadrature (SLQ) method for approximating the trace of matrix functions, with a specific focus on asymmetric Lanczos quadrature rules. We reexplain an existing theoretical discrepancy regarding the necessity of a scaling factor when applying an affine transformation from the reference interval to the physical spectral interval. Furthermore, we introduce an optimized error reallocation technique for log-determinant estimation. Rather than evenly splitting the error tolerance between the Hutchinson trace estimator and the Lanczos quadrature, we formulate an optimization problem to strategically distribute the error budget. This approach minimizes the total number of matrix-vector multiplications (MVMs) required to reach a target accuracy for both Rademacher and Gaussian queries. Numerical experiments validate that this reallocation yields tighter theoretical bounds and provides a concrete rule-of-thumb for parameter configuration: to achieve a target accuracy efficiently, more computational resources should be allocated to the Lanczos process (larger m) rather than Monte Carlo sampling (smaller N).
CLOct 23, 2023
Unleashing the potential of prompt engineering for large language modelsBanghao Chen, Zhaofeng Zhang, Nicolas Langrené et al.
This comprehensive review delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). The development of Artificial Intelligence (AI), from its inception in the 1950s to the emergence of advanced neural networks and deep learning architectures, has made a breakthrough in LLMs, with models such as GPT-4o and Claude-3, and in Vision-Language Models (VLMs), with models such as CLIP and ALIGN. Prompt engineering is the process of structuring inputs, which has emerged as a crucial technique to maximize the utility and accuracy of these models. This paper explores both foundational and advanced methodologies of prompt engineering, including techniques such as self-consistency, chain-of-thought, and generated knowledge, which significantly enhance model performance. Additionally, it examines the prompt method of VLMs through innovative approaches such as Context Optimization (CoOp), Conditional Context Optimization (CoCoOp), and Multimodal Prompt Learning (MaPLe). Critical to this discussion is the aspect of AI security, particularly adversarial attacks that exploit vulnerabilities in prompt engineering. Strategies to mitigate these risks and enhance model robustness are thoroughly reviewed. The evaluation of prompt methods is also addressed through both subjective and objective metrics, ensuring a robust analysis of their efficacy. This review also reflects the essential role of prompt engineering in advancing AI capabilities, providing a structured framework for future research and application.
COJul 12, 2016
Information Splitting for Big Data AnalyticsShengxin Zhu, Tongxiang Gu, Xiaowen Xu et al.
Many statistical models require an estimation of unknown (co)-variance parameter(s) in a model. The estimation usually obtained by maximizing a log-likelihood which involves log determinant terms. In principle, one requires the \emph{observed information}--the negative Hessian matrix or the second derivative of the log-likelihood---to obtain an accurate maximum likelihood estimator according to the Newton method. When one uses the \emph{Fisher information}, the expect value of the observed information, a simpler algorithm than the Newton method is obtained as the Fisher scoring algorithm. With the advance in high-throughput technologies in the biological sciences, recommendation systems and social networks, the sizes of data sets---and the corresponding statistical models---have suddenly increased by several orders of magnitude. Neither the observed information nor the Fisher information is easy to obtained for these big data sets. This paper introduces an information splitting technique to simplify the computation. After splitting the mean of the observed information and the Fisher information, an simpler approximate Hessian matrix for the log-likelihood can be obtained. This approximated Hessian matrix can significantly reduce computations, and makes the linear mixed model applicable for big data sets. Such a spitting and simpler formulas heavily depends on matrix algebra transforms, and applicable to large scale breeding model, genetics wide association analysis.
COAug 17, 2022
CSGO: Constrained-Softassign Gradient Optimization For Large Graph MatchingBinrui Shen, Qiang Niu, Shengxin Zhu
Graph matching aims to find correspondences between two graphs. This paper integrates several well-known graph matching algorithms into a framework: the constrained gradient method. The primary difference among these algorithms lies in tuning a step size parameter and constraining operators. By leveraging these insights, we propose an adaptive step size parameter to guarantee the underlying algorithms' convergence, simultaneously enhancing their efficiency and robustness. For the constraining operator, we introduce a scalable softassign for large graph matching problems. Compared to the original softassign, our approach offers increased speed, improved robustness, and reduced risk of overflow. The advanced constraining operator enables a CSGO for large graph matching, which outperforms state-of-the-art methods in experiments. Notably, in attributed graph matching tasks, CSGO achieves an over 10X increase in speed compared to current constrained gradient algorithms.
IRJul 12, 2024
Movie Recommendation with Poster Attention via Multi-modal Transformer Feature FusionLinhan Xia, Yicheng Yang, Ziou Chen et al.
Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi-modal movie recommendation system by extract features of the well designed posters for each movie and the narrative text description of the movie. This system uses the BERT model to extract the information of text modality, the ViT model applied to extract the information of poster/image modality, and the Transformer architecture for feature fusion of all modalities to predict users' preference. The integration of pre-trained foundational models with some smaller data sets in downstream applications capture multi-modal content features in a more comprehensive manner, thereby providing more accurate recommendations. The efficiency of the proof-of-concept model is verified by the standard benchmark problem the MovieLens 100K and 1M datasets. The prediction accuracy of user ratings is enhanced in comparison to the baseline algorithm, thereby demonstrating the potential of this cross-modal algorithm to be applied for movie or video recommendation.
NAMay 28, 2016
Computing Overlaps of Two Quadrilateral Mesh Of the Same ConnectivityXihua Xu, Shengxin Zhu
An exact conservative remapping scheme requires overlaps between two meshes and a reconstruction scheme on the old cells (Lagrangian mesh). While the are intensive discussion on reconstruction schemes, there are relative sparse discussion on how to calculate overlaps of two grids. Computing the exact overlaps was believed complicated and often be avoided. This paper introduces the mathematical formulation of such a problem and tools to solve the problem. We propose methods to calculate the overlaps of two dismissable general quadrilateral mesh of the same logically structure in a planar domain. The quadrilateral polygon intersection problem is reduced to as a problem that how an edge in a new mesh intersects with a local frame which consists at most 7 connected edges in the old mesh. As such, locality of the method is persevered. The alternative direction technique is applied to reduce the dimension of the searching space. It reduces more than 256 possible intersections between a new cell with the old tessellation to 34 (17 when considering symmetry) programmable intersections between an edge and an local frame whenever the intersection between the old and new cell does not degenerate. At the same time, we shall how the computational amount of the overlaps depends on the underlying problem in terms of singular intersection points. A simple and detailed classification on the type of overlaps is presented, according to classification, degeneracy of an overlap can be easily identified.
64.3SEMay 13
AI Harness Engineering: A Runtime Substrate for Foundation-Model Software AgentsHailin Zhong, Shengxin Zhu
Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a foundation-model agent observes a project, acts on it, receives feedback, and establishes that a change is complete. We formalize this substrate as an AI Harness Engineering and identify eleven component responsibilities: task specification, context selection, tool access, project memory, task state, observability, failure attribution, verification, permissions, entropy auditing, and intervention recording. We operationalize the harness through a four-level ladder (H0-H3) that progressively exposes runtime support to the agent, and we propose a trace-based evaluation protocol that converts each agent run into an auditable episode package. Applied to a controlled validation task, the framework yields episode packages whose evidence structure varies systematically with harness level: lower levels produce only a final patch, higher levels produce reproduction logs, failure attributions, deterministic requirement checks, and structured verification reports. The framework reframes the central question of autonomous software engineering from whether a foundation model can produce a patch to whether the model-harness-environment system can produce a verifiably correct, attributed, and maintainable change. We outline a research program for the runtime systems that foundation-model software agents will require.
MFMar 30, 2024
Quantformer: from attention to profit with a quantitative transformer trading strategyZhaofeng Zhang, Banghao Chen, Shengxin Zhu et al.
In traditional quantitative trading practice, navigating the complicated and dynamic financial market presents a persistent challenge. Fully capturing various market variables, including long-term information, as well as essential signals that may lead to profit remains a difficult task for learning algorithms. In order to tackle this challenge, this paper introduces quantformer, an enhanced neural network architecture based on transformer, to build investment factors. By transfer learning from sentiment analysis, quantformer not only exploits its original inherent advantages in capturing long-range dependencies and modeling complex data relationships, but is also able to solve tasks with numerical inputs and accurately forecast future returns over a given period. This work collects more than 5,000,000 rolling data of 4,601 stocks in the Chinese capital market from 2010 to 2023. The results of this study demonstrate the model's superior performance in predicting stock trends compared with other 100-factor-based quantitative strategies. Notably, the model's innovative use of transformer-like model to establish factors, in conjunction with market sentiment information, has been shown to enhance the accuracy of trading signals significantly, thereby offering promising implications for the future of quantitative trading strategies.
MAMay 27, 2025
GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender SimulationHailin Zhong, Hanlin Wang, Yujun Ye et al.
Current personalized recommender systems predominantly rely on static offline data for algorithm design and evaluation, significantly limiting their ability to capture long-term user preference evolution and social influence dynamics in real-world scenarios. To address this fundamental challenge, we propose a high-fidelity social simulation platform integrating human-like cognitive agents and dynamic social interactions to realistically simulate user behavior evolution under recommendation interventions. Specifically, the system comprises a population of Sim-User Agents, each equipped with a five-layer cognitive architecture that encapsulates key psychological mechanisms, including episodic memory, affective state transitions, adaptive preference learning, and dynamic trust-risk assessments. In particular, we innovatively introduce the Intimacy--Curiosity--Reciprocity--Risk (ICR2) motivational engine grounded in psychological and sociological theories, enabling more realistic user decision-making processes. Furthermore, we construct a multilayer heterogeneous social graph (GGBond Graph) supporting dynamic relational evolution, effectively modeling users' evolving social ties and trust dynamics based on interest similarity, personality alignment, and structural homophily. During system operation, agents autonomously respond to recommendations generated by typical recommender algorithms (e.g., Matrix Factorization, MultVAE, LightGCN), deciding whether to consume, rate, and share content while dynamically updating their internal states and social connections, thereby forming a stable, multi-round feedback loop. This innovative design transcends the limitations of traditional static datasets, providing a controlled, observable environment for evaluating long-term recommender effects.
LGJul 26, 2025
FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision ComputingBinrui Shen, Yuan Liang, Shengxin Zhu
Graph matching, typically formulated as a Quadratic Assignment Problem (QAP), seeks to establish node correspondences between two graphs. To address the NP-hardness of QAP, some existing methods adopt projection-based relaxations that embed the problem into the convex hull of the discrete domain. However, these relaxations inevitably enlarge the feasible set, introducing two sources of error: numerical scale sensitivity and geometric misalignment between the relaxed and original domains. To alleviate these errors, we propose a novel relaxation framework by reformulating the projection step as a Frobenius-regularized Linear Assignment (FRA) problem, where a tunable regularization term mitigates feasible region inflation. This formulation enables normalization-based operations to preserve numerical scale invariance without compromising accuracy. To efficiently solve FRA, we propose the Scaling Doubly Stochastic Normalization (SDSN) algorithm. Building on its favorable computational properties, we develop a theoretically grounded mixed-precision architecture to achieve substantial acceleration. Comprehensive CPU-based benchmarks demonstrate that FRAM consistently outperforms all baseline methods under identical precision settings. When combined with a GPU-based mixed-precision architecture, FRAM achieves up to 370X speedup over its CPU-FP64 counterpart, with negligible loss in solution accuracy.
CLNov 27, 2024
Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?Lewen Yang, Xuanyu Zhou, Juao Fan et al.
Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning, and pre-trained models can be fine-tuned and applied to various downstream tasks. Under the framework of foundational models, models such as Bidirectional Encoder Representations from Transformers(BERT) and Generative Pre-trained Transformer(GPT) have greatly advanced the development of natural language processing(NLP), especially the emergence of many models based on BERT. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. It can capture bidirectional context information to predict the masked words in the sequence, this can improve the feature extraction ability of the model. This makes the model very useful for downstream tasks, especially for specialized applications. The model using the bidirectional encoder can better understand the domain knowledge and be better applied to these downstream tasks. So we hope to help understand how this technology has evolved and improved model performance in various natural language processing tasks under the background of foundational models and reveal its importance in capturing context information and improving the model's performance on downstream tasks. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model. It also briefly analyzes BERT and the improvements of some models based on BERT. The model's performance on the Stanford Question Answering Dataset(SQuAD) and General Language Understanding Evaluation(GLUE) was compared.
LGJun 4, 2024
EchoMamba4Rec: Harmonizing Bidirectional State Space Models with Spectral Filtering for Advanced Sequential RecommendationYuda Wang, Xuxin He, Shengxin Zhu
Predicting user preferences and sequential dependencies based on historical behavior is the core goal of sequential recommendation. Although attention-based models have shown effectiveness in this field, they often struggle with inference inefficiency due to the quadratic computational complexity inherent in attention mechanisms, especially with long-range behavior sequences. Drawing inspiration from the recent advancements of state space models (SSMs) in control theory, which provide a robust framework for modeling and controlling dynamic systems, we introduce EchoMamba4Rec. Control theory emphasizes the use of SSMs for managing long-range dependencies and maintaining inferential efficiency through structured state matrices. EchoMamba4Rec leverages these control relationships in sequential recommendation and integrates bi-directional processing with frequency-domain filtering to capture complex patterns and dependencies in user interaction data more effectively. Our model benefits from the ability of state space models (SSMs) to learn and perform parallel computations, significantly enhancing computational efficiency and scalability. It features a bi-directional Mamba module that incorporates both forward and reverse Mamba components, leveraging information from both past and future interactions. Additionally, a filter layer operates in the frequency domain using learnable Fast Fourier Transform (FFT) and learnable filters, followed by an inverse FFT to refine item embeddings and reduce noise. We also integrate Gate Linear Units (GLU) to dynamically control information flow, enhancing the model's expressiveness and training stability. Experimental results demonstrate that EchoMamba significantly outperforms existing models, providing more accurate and personalized recommendations.
CVJan 16, 2020
Fabricated Pictures Detection with Graph MatchingBinrui Shen, Qiang Niu, Shengxin Zhu
Fabricating experimental pictures in research work is a serious academic misconduct, which should better be detected in the reviewing process. However, due to large number of submissions, the detection whether a picture is fabricated or reused is laborious for reviewers, and sometimes is indistinct with human eyes. A tool for detecting similarity between images may help to alleviate this problem. Some methods based on local feature points matching work for most of the time, while these methods may result in mess of matchings due to ignorance of global relationship between features. We present a framework to detect similar, or perhaps fabricated, pictures with the graph matching techniques. A new iterative method is proposed, and experiments show that such a graph matching technique is better than the methods based only on local features for some cases.
CONov 2, 2019
Sparse inversion for derivative of log determinantShengxin Zhu, Andrew J Wathen
Algorithms for Gaussian process, marginal likelihood methods or restricted maximum likelihood methods often require derivatives of log determinant terms. These log determinants are usually parametric with variance parameters of the underlying statistical models. This paper demonstrates that, when the underlying matrix is sparse, how to take the advantage of sparse inversion---selected inversion which share the same sparsity as the original matrix---to accelerate evaluating the derivative of log determinant.
NAAug 25, 2016
Computing log-likelihood and its derivatives for restricted maximum likelihood methodsShengxin Zhu
Recent large scale genome wide association analysis involves large scale linear mixed models. Quantifying (co)-variance parameters in the mixed models with a restricted maximum likelihood method results in a score function which is the first derivative of a log-likelihood. To obtain a statistically efficient estimate of the variance parameters, one needs to find the root of the score function via the Newton method. Most elements of the Jacobian matrix of the score involve a trace term of four parametric matrix-matrix multiplications. It is computationally prohibitively for large scale data sets. By a serial matrix transforms and an averaged information splitting technique, an approximate Jacobian matrix can be obtained by splitting the average of the Jacobian matrix and its expected value. In the approximated Jacobian, its elements only involve Four matrix vector multiplications which can be efficiently evaluated by solving sparse linear systems with the multi-frontal factorization method.