LGMay 27
Ensemble Score Filtering for Real-Data Energy Consumption Forecast CorrectionRuoyu Hu, Dahai Yu, Feng Bao et al.
Accurate estimation and forecasting of energy consumption are important for power-system operation, planning, and demand-side management. In practice, however, complete and timely measurements may not always be available, and the observed data can be partial, noisy, or delayed. This motivates the use of learned forecasting models for predicting the evolving consumption state, together with data assimilation methods for sequential forecast correction. In this work, we study a high-dimensional data assimilation problem for real energy-consumption data. \modeltext{The forward prediction is supplied by a pretrained black-box spatio-temporal forecasting model, which is treated as the state propagator in the filtering procedure.} We employ the Ensemble Score Filter (EnSF) to assimilate partial and noisy observations and to correct the forecast trajectory over time. The EnSF uses score-based diffusion models to approximate filtering distributions and avoids retraining neural-network score models during assimilation by using a closed-form score representation and Monte Carlo approximation. Numerical experiments demonstrate that open-loop propagation of the learned forecasting model can become unreliable over long horizons, while EnSF-based correction substantially improves state estimation. Comparisons with the Ensemble Kalman Filter (EnKF) further show that EnSF provides stronger correction under the nonlinear observation setting considered in this work.
NANov 8, 2017
An improved discrete least-squares/reduced-basis method for parameterized elliptic PDEsMax Gunzburger, Michael Schneier, Clayton Webster et al.
It is shown that the computational efficiency of the discrete least-squares (DLS) approximation of solutions of stochastic elliptic PDEs is improved by incorporating a reduced-basis method into the DLS framework. The goal is to recover the entire solution map from the parameter space to the finite element space. To this end, first, a reduced-basis solution using a weak greedy algorithm is constructed, then a DLS approximation is determined by evaluating the reduced-basis approximation instead of the full finite element approximation. The main advantage of the new approach is that one only need apply the DLS operator to the coefficients of the reduced-basis expansion, resulting in huge savings in both the storage of the DLS coefficients and the online cost of evaluating the DLS approximation. In addition, the recently developed quasi-optimal polynomial space is also adopted in the new approach, resulting in superior convergence rates for a wider class of problems than previous analyzed. Numerical experiments are provided that illustrate the theoretical results.
IRSep 16, 2023
An Unified Search and Recommendation Foundation Model for Cold-Start ScenarioYuqi Gong, Xichen Ding, Yehui Su et al.
In modern commercial search engines and recommendation systems, data from multiple domains is available to jointly train the multi-domain model. Traditional methods train multi-domain models in the multi-task setting, with shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of features, labels, and sample distributions of individual tasks. With the development of large language models, LLM can extract global domain-invariant text features that serve both search and recommendation tasks. We propose a novel framework called S\&R Multi-Domain Foundation, which uses LLM to extract domain invariant features, and Aspect Gating Fusion to merge the ID feature, domain invariant text features and task-specific heterogeneous sparse features to obtain the representations of query and item. Additionally, samples from multiple search and recommendation scenarios are trained jointly with Domain Adaptive Multi-Task module to obtain the multi-domain foundation model. We apply the S\&R Multi-Domain foundation model to cold start scenarios in the pretrain-finetune manner, which achieves better performance than other SOTA transfer learning methods. The S\&R Multi-Domain Foundation model has been successfully deployed in Alipay Mobile Application's online services, such as content query recommendation and service card recommendation, etc.
CLNov 15, 2023
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term MemoryLei Liu, Xiaoyan Yang, Yue Shen et al.
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions.
LGApr 12, 2023
Edge-cloud Collaborative Learning with Federated and Centralized FeaturesZexi Li, Qunwei Li, Yi Zhou et al.
Federated learning (FL) is a popular way of edge computing that doesn't compromise users' privacy. Current FL paradigms assume that data only resides on the edge, while cloud servers only perform model averaging. However, in real-life situations such as recommender systems, the cloud server has the ability to store historical and interactive features. In this paper, our proposed Edge-Cloud Collaborative Knowledge Transfer Framework (ECCT) bridges the gap between the edge and cloud, enabling bi-directional knowledge transfer between both, sharing feature embeddings and prediction logits. ECCT consolidates various benefits, including enhancing personalization, enabling model heterogeneity, tolerating training asynchronization, and relieving communication burdens. Extensive experiments on public and industrial datasets demonstrate ECCT's effectiveness and potential for use in academia and industry.
NAJun 22, 2016
Explicit cost bounds of stochastic Galerkin approximations for parameterized PDEs with random coefficientsNick Dexter, Clayton Webster, Guannan Zhang
This work analyzes the overall computational complexity of the stochastic Galerkin finite element method (SGFEM) for approximating the solution of parameterized elliptic partial differential equations with both affine and non-affine random coefficients. To compute the fully discrete solution, such approaches employ a Galerkin projection in both the deterministic and stochastic domains, produced here by a combination of finite elements and a global orthogonal basis, defined on an isotopic total degree index set, respectively. To account for the sparsity of the resulting system, we present a rigorous cost analysis that considers the total number of coupled finite element systems that must be simultaneously solved in the SGFEM. However, to maintain sparsity as the coefficient becomes increasingly nonlinear in the parameterization, it is necessary to also approximate the coefficient by an additional orthogonal expansion. In this case we prove a rigorous complexity estimate for the number of floating point operations (FLOPs) required per matrix-vector multiplication of the coupled system. Based on such complexity estimates we also develop explicit cost bounds in terms of FLOPs to solve the stochastic Galerkin (SG) systems to a prescribed tolerance, which are used to compare with the minimal complexity estimates of a stochastic collocation finite element method (SCFEM), shown in our previous work [16]. Finally, computational evidence complements the theoretical estimates and supports our conclusion that, in the case that the coefficient is affine, the coupled SG system can be solved more efficiently than the decoupled SC systems. However, as the coefficient becomes more nonlinear, it becomes prohibitively expensive to obtain an approximation with the SGFEM.
NAFeb 12, 2018
A Domain-Decomposition Model Reduction Method for Linear Convection-Diffusion Equations with Random CoefficientsLin Mu, Guannan Zhang
We develop a domain-decomposition model reduction method for linear steady-state convection-diffusion equations with random coefficients. Of particular interest to this effort are the diffusion equations with random diffusivities, and the convection-dominated transport equations with random velocities. We investigate the equations with two types of random fields, i.e., colored noises and discrete white noises, both of which can lead to high-dimensional parametric dependence. The motivation is to use domain decomposition to exploit low-dimensional structures of local problems in the sub-domains, such that the total number of expensive PDE solves can be greatly reduced. Our objective is to develop an efficient model reduction method to simultaneously handle high-dimensionality and irregular behaviors of the stochastic PDEs under consideration. The advantages of our method lie in three aspects: (i) online-offline decomposition, i.e., the online cost is independent of the size of the triangle mesh; (ii) operator approximation for handling non-affine and high-dimensional random fields; (iii) effective strategy to capture irregular behaviors, e.g., sharp transitions of the PDE solution. Two numerical examples will be provided to demonstrate the advantageous performance of our method.
LGSep 6, 2023
Marketing Budget Allocation with Offline Constrained Deep Reinforcement LearningTianchi Cai, Jiyan Jiang, Wenpeng Zhang et al.
We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.
CLAug 24, 2023
Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source ModelsYue Wang, Xinrui Wang, Juntao Li et al.
Instruction tuning is instrumental in enabling Large Language Models~(LLMs) to follow user instructions to complete various open-domain tasks. The success of instruction tuning depends on the availability of high-quality instruction data. Owing to the exorbitant cost and substandard quality of human annotation, recent works have been deeply engaged in the exploration of the utilization of powerful closed-source models to generate instruction data automatically. However, these methods carry potential risks arising from the usage requirements of powerful closed-source models, which strictly forbid the utilization of their outputs to develop machine learning models. To deal with this problem, in this work, we explore alternative approaches to generate high-quality instruction data that do not rely on closed-source models. Our exploration includes an investigation of various existing instruction generation methods, culminating in the integration of the most efficient variant with two novel strategies to enhance the quality further. Evaluation results from two benchmarks and the GPT-4 model demonstrate the effectiveness of our generated instruction data, which can outperform Alpaca, a method reliant on closed-source models. We hope that more progress can be achieved in generating high-quality instruction data without using closed-source models.
AIFeb 6Code
JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional TasksLanbo Lin, Jiayao Liu, Tianyuan Yang et al.
Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommodate diverse valid response strategies, while LLM-as-a-judge approaches adapt to individual responses yet suffer from instability and bias. Human experts address this dilemma by combining domain-grounded principles with dynamic, claim-level assessment. Inspired by this process, we propose JADE, a two-layer evaluation framework. Layer 1 encodes expert knowledge as a predefined set of evaluation skills, providing stable evaluation criteria. Layer 2 performs report-specific, claim-level evaluation to flexibly assess diverse reasoning strategies, with evidence-dependency gating to invalidate conclusions built on refuted claims. Experiments on BizBench show that JADE improves evaluation stability and reveals critical agent failure modes missed by holistic LLM-based evaluators. We further demonstrate strong alignment with expert-authored rubrics and effective transfer to a medical-domain benchmark, validating JADE across professional domains. Our code is publicly available at https://github.com/smiling-world/JADE.
LGApr 25, 2023
GARCIA: Powering Representations of Long-tail Query with Multi-granularity Contrastive LearningWeifan Wang, Binbin Hu, Zhicheng Peng et al.
Recently, the growth of service platforms brings great convenience to both users and merchants, where the service search engine plays a vital role in improving the user experience by quickly obtaining desirable results via textual queries. Unfortunately, users' uncontrollable search customs usually bring vast amounts of long-tail queries, which severely threaten the capability of search models. Inspired by recently emerging graph neural networks (GNNs) and contrastive learning (CL), several efforts have been made in alleviating the long-tail issue and achieve considerable performance. Nevertheless, they still face a few major weaknesses. Most importantly, they do not explicitly utilize the contextual structure between heads and tails for effective knowledge transfer, and intention-level information is commonly ignored for more generalized representations. To this end, we develop a novel framework GARCIA, which exploits the graph based knowledge transfer and intention based representation generalization in a contrastive setting. In particular, we employ an adaptive encoder to produce informative representations for queries and services, as well as hierarchical structure aware representations of intentions. To fully understand tail queries and services, we equip GARCIA with a novel multi-granularity contrastive learning module, which powers representations through knowledge transfer, structure enhancement and intention generalization. Subsequently, the complete GARCIA is well trained in a pre-training&fine-tuning manner. At last, we conduct extensive experiments on both offline and online environments, which demonstrates the superior capability of GARCIA in improving tail queries and overall performance in service search scenarios.
AINov 1, 2023
On the Opportunities of Green Computing: A SurveyYou Zhou, Xiujing Lin, Xiang Zhang et al.
Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention is paid on pursuing new state-of-the-art (SOTA) results, resulting in ever increasing of model size and computational complexity. The needs for high computing power brings higher carbon emission and undermines research fairness by preventing small or medium-sized research institutions and companies with limited funding in participating in research. To tackle the challenges of computing resources and environmental impact of AI, Green Computing has become a hot research topic. In this survey, we give a systematic overview of the technologies used in Green Computing. We propose the framework of Green Computing and devide it into four key components: (1) Measures of Greenness, (2) Energy-Efficient AI, (3) Energy-Efficient Computing Systems and (4) AI Use Cases for Sustainability. For each components, we discuss the research progress made and the commonly used techniques to optimize the AI efficiency. We conclude that this new research direction has the potential to address the conflicts between resource constraints and AI development. We encourage more researchers to put attention on this direction and make AI more environmental friendly.
LGAug 25, 2023
Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsTianchi Cai, Shenliao Bao, Jiyan Jiang et al.
Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
IRAug 31, 2023
AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR PredictionZhaoxin Huan, Ke Ding, Ang Li et al.
Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.
NAJan 27, 2023
TransNet: Transferable Neural Networks for Partial Differential EquationsZezhong Zhang, Feng Bao, Lili Ju et al.
Transfer learning for partial differential equations (PDEs) is to develop a pre-trained neural network that can be used to solve a wide class of PDEs. Existing transfer learning approaches require much information of the target PDEs such as its formulation and/or data of its solution for pre-training. In this work, we propose to construct transferable neural feature spaces from purely function approximation perspectives without using PDE information. The construction of the feature space involves re-parameterization of the hidden neurons and uses auxiliary functions to tune the resulting feature space. Theoretical analysis shows the high quality of the produced feature space, i.e., uniformly distributed neurons. Extensive numerical experiments verify the outstanding performance of our method, including significantly improved transferability, e.g., using the same feature space for various PDEs with different domains and boundary conditions, and the superior accuracy, e.g., several orders of magnitude smaller mean squared error than the state of the art methods.
AIFeb 18
NeuDiff Agent: A Governed AI Workflow for Single-Crystal Neutron CrystallographyZhongcan Xiao, Leyi Zhang, Guannan Zhang et al.
Large-scale facilities increasingly face analysis and reporting latency as the limiting step in scientific throughput, particularly for structurally and magnetically complex samples that require iterative reduction, integration, refinement, and validation. To improve time-to-result and analysis efficiency, NeuDiff Agent is introduced as a governed, tool-using AI workflow for TOPAZ at the Spallation Neutron Source that takes instrument data products through reduction, integration, refinement, and validation to a validated crystal structure and a publication-ready CIF. NeuDiff Agent executes this established pipeline under explicit governance by restricting actions to allowlisted tools, enforcing fail-closed verification gates at key workflow boundaries, and capturing complete provenance for inspection, auditing, and controlled replay. Performance is assessed using a fixed prompt protocol and repeated end-to-end runs with two large language model backends, with user and machine time partitioned and intervention burden and recovery behaviors quantified under gating. In a reference-case benchmark, NeuDiff Agent reduces wall time from 435 minutes (manual) to 86.5(4.7) to 94.4(3.5) minutes (4.6-5.0x faster) while producing a validated CIF with no checkCIF level A or B alerts. These results establish a practical route to deploy agentic AI in facility crystallography while preserving traceability and publication-facing validation requirements.
LGOct 22, 2023
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density EstimationYanfang Liu, Minglei Yang, Zezhong Zhang et al.
We present a supervised learning framework of training generative models for density estimation. Generative models, including generative adversarial networks, normalizing flows, variational auto-encoders, are usually considered as unsupervised learning models, because labeled data are usually unavailable for training. Despite the success of the generative models, there are several issues with the unsupervised training, e.g., requirement of reversible architectures, vanishing gradients, and training instability. To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner. Compared with existing normalizing flow models, our method does not require to use reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the UCI repository.
LGJul 16, 2024
A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere DynamicsJunqi Yin, Siming Liang, Siyan Liu et al.
The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.
LGNov 20, 2023
MultiLoRA: Democratizing LoRA for Better Multi-Task LearningYiming Wang, Yu Lin, Xiaodong Zeng et al.
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.
AINov 23, 2023
PrivateLoRA For Efficient Privacy Preserving LLMYiming Wang, Yu Lin, Xiaodong Zeng et al.
End users face a choice between privacy and efficiency in current Large Language Model (LLM) service paradigms. In cloud-based paradigms, users are forced to compromise data locality for generation quality and processing speed. Conversely, edge device paradigms maintain data locality but fail to deliver satisfactory performance. In this work, we propose a novel LLM service paradigm that distributes privacy-sensitive computation on edge devices and shared computation in the cloud. Only activations are transmitted between the central cloud and edge devices to ensure data locality. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations, achieving over 95% communication reduction. Consequently, PrivateLoRA effectively maintains data locality and is extremely resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA for advanced personalization. Our approach democratizes access to state-of-the-art generative AI for edge devices, paving the way for more tailored LLM experiences for the general public. To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.
LGApr 17
Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System PredictionXiao Wang, Zezhong Zhang, Isaac Lyngaas et al.
Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification and prediction of extreme events. We introduce a unified one-stage generative DA framework that reformulates assimilation as Bayesian posterior sampling, replacing the conventional forecast-update cycle with compute-dense, GPU-efficient inference. At the core is STORM, a novel spatiotemporal transformer with a global attention linear-complexity scaling algorithm that breaks the quadratic attention barrier. On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames, regimes previously unreachable, establishing a new paradigm for Earth system prediction.
LGDec 5, 2023Code
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise PreferenceTianchi Cai, Xierui Song, Jiyan Jiang et al.
Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most preference learning methods, such as RLHF and DPO, depend on pairwise preference data, which inadequately address scenarios where human feedback is point-wise, leading to potential information loss and suboptimal performance. Addressing this gap, we introduce Point-wise Direct Preference Optimization, a novel preference learning method designed to harness point-wise feedback effectively. Our work also uncovers a novel connection between supervised fine-tuning and point-wise preference learning, culminating in Unified Language Model Alignment, a single-step method that unifies the alignment with human demonstrations and point-wise preferences. Extensive experiments on point-wise preference datasets with binary or continuous labels validate the effectiveness of our methods. Our code and a new dataset with high-quality demonstration samples on harmlessness are released.
CVFeb 5
SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMsJintao Tong, Shilin Yan, Hongwei Xue et al.
Multimodal Large Language Models (MLLMs) have made remarkable progress in multimodal perception and reasoning by bridging vision and language. However, most existing MLLMs perform reasoning primarily with textual CoT, which limits their effectiveness on vision-intensive tasks. Recent approaches inject a fixed number of continuous hidden states as "visual thoughts" into the reasoning process and improve visual performance, but often at the cost of degraded text-based logical reasoning. We argue that the core limitation lies in a rigid, pre-defined reasoning pattern that cannot adaptively choose the most suitable thinking modality for different user queries. We introduce SwimBird, a reasoning-switchable MLLM that dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning (continuous hidden states as visual thoughts), and (3) interleaved vision-text reasoning. To enable this capability, we adopt a hybrid autoregressive formulation that unifies next-token prediction for textual thoughts with next-embedding prediction for visual thoughts, and design a systematic reasoning-mode curation strategy to construct SwimBird-SFT-92K, a diverse supervised fine-tuning dataset covering all three reasoning patterns. By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks. Experiments across diverse benchmarks covering textual reasoning and challenging visual understanding demonstrate that SwimBird achieves state-of-the-art results and robust gains over prior fixed-pattern multimodal reasoning methods.
CVMar 12
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional ReasoningHaozhan Shen, Shilin Yan, Hongwei Xue et al.
Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified visual compositional conditions (e.g., "if a permission dialog appears and the color of the interface is green, click Allow") and the process may branch or terminate early. Yet this capability remains under-evaluated: existing benchmarks focus on shallow-compositions or independent-constraints rather than deeply chained compositional conditionals. In this paper, we introduce MM-CondChain, a benchmark for visually grounded deep compositional reasoning. Each benchmark instance is organized as a multi-layer reasoning chain, where every layer contains a non-trivial compositional condition grounded in visual evidence and built from multiple objects, attributes, or relations. To answer correctly, an MLLM must perceive the image in detail, reason over multiple visual elements at each step, and follow the resulting execution path to the final outcome. To scalably construct such workflow-style data, we propose an agentic synthesis pipeline: a Planner orchestrates layer-by-layer generation of compositional conditions, while a Verifiable Programmatic Intermediate Representation (VPIR) ensures each layer's condition is mechanically verifiable. A Composer then assembles these verified layers into complete instructions. Using this pipeline, we construct benchmarks across three visual domains: natural images, data charts, and GUI trajectories. Experiments on a range of MLLMs show that even the strongest model attains only 53.33 Path F1, with sharp drops on hard negatives and as depth or predicate complexity grows, confirming that deep compositional reasoning remains a fundamental challenge.
LGDec 9, 2024Code
GenAI4UQ: A Software for Inverse Uncertainty Quantification Using Conditional Generative ModelsMing Fan, Zezhong Zhang, Dan Lu et al.
We introduce GenAI4UQ, a software package for inverse uncertainty quantification in model calibration, parameter estimation, and ensemble forecasting in scientific applications. GenAI4UQ leverages a generative artificial intelligence (AI) based conditional modeling framework to address the limitations of traditional inverse modeling techniques, such as Markov Chain Monte Carlo methods. By replacing computationally intensive iterative processes with a direct, learned mapping, GenAI4UQ enables efficient calibration of model input parameters and generation of output predictions directly from observations. The software's design allows for rapid ensemble forecasting with robust uncertainty quantification, while maintaining high computational and storage efficiency. GenAI4UQ simplifies the model training process through built-in auto-tuning of hyperparameters, making it accessible to users with varying levels of expertise. Its conditional generative framework ensures versatility, enabling applicability across a wide range of scientific domains. At its core, GenAI4UQ transforms the paradigm of inverse modeling by providing a fast, reliable, and user-friendly solution. It empowers researchers and practitioners to quickly estimate parameter distributions and generate model predictions for new observations, facilitating efficient decision-making and advancing the state of uncertainty quantification in computational modeling. (The code and data are available at https://github.com/patrickfan/GenAI4UQ).
ROMay 13
TouchAnything: A Dataset and Framework for Bimanual Tactile Estimation from Egocentric VideoJianyi Zhou, Ziteng Gao, Feiyang Hong et al.
Egocentric human video data, which captures rich human-environment interactions and can be collected at scale, has become a key driver of embodied intelligence research. However, existing egocentric datasets typically lack tactile sensing, a critical modality that provides direct cues about contact, force, and pressure in human-object interaction. Without such signals, models struggle to learn physically grounded representations of real-world interaction dynamics. While tactile sensors provide these cues, deploying high-quality tactile hardware at scale remains expensive and cumbersome. This raises a central question: can tactile feedback be inferred directly from visual observations, enabling scalable tactile supervision for egocentric video data and supporting physically grounded embodied learning? To enable research in this direction, we introduce EgoTouch, a large-scale multi-view egocentric dataset with dense tactile supervision for bimanual hand-object interaction. EgoTouch comprises 208 manipulation tasks spanning 1,891 episodes in diverse indoor and outdoor environments, with synchronized multi-view RGB (head-mounted egocentric and dual wrist-mounted cameras), bimanual 3D hand pose, and continuous pressure maps from wearable tactile sensors. Building on EgoTouch, we introduce TouchAnything, a baseline multi-view vision-to-touch prediction framework that uses the egocentric view as the primary input and flexibly leverages available wrist-mounted views at inference time. Experiments show that incorporating wrist-mounted views generally improves tactile prediction over egocentric-only input, achieving up to 5.0% relative improvement in Contact IoU and 6.1% relative improvement in Volumetric IoU. We will publicly release the dataset, code, and benchmark.
IRJul 15, 2024
SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential RecommendationKaiming Shen, Xichen Ding, Zixiang Zheng et al.
The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist divergence across modalities. We also observe that users' search query sequences and item browsing sequences can fully depict users' intents and benefit from each other. To address these challenges, we propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval. Specifically, a network called Pretraining Search Unit (PSU) learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner with multiple objectives: multi-modal alignment, next query-item pair prediction, query-item relevance prediction, etc. After pretraining, the downstream model restores the pretrained embedding as initialization and finetunes the network. To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy to approximate the exact attention calculati
AISep 29, 2025Code
Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic TasksPeiran Xu, Zhuohao Li, Xiaoying Xing et al.
Large Language Models (LLMs) increasingly rely on external tools such as search engines to solve complex agentic tasks that require reasoning and external knowledge retrieval. Recently, reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in advancing capabilities of LLMs by rewarding the final answers via outcome rewards. While straightforward to supervise, outcome rewards only provide sparse signals and delayed feedback, which limits their effectiveness on long trajectories. Process rewards address this by evaluating intermediate steps, providing fine-grained supervision and encouraging grounded problem solving. However, it is notoriously hard to annotate step-wise labels, especially in non-verifiable process without "golden" answers. Furthermore, step-wise judgment requires the balance between local quality with contribution to the final outcome, as optimizing towards higher process reward may not always align with better final outcomes. To address the above challenges, we introduce Principle Process Reward (PPR), an RL approach that unifies principled step-level assessment and outcome verification. We train a principle-based reward model to improve the transparency and reliability of process evaluation, and further introduce a Reward Normalization (ReNorm) strategy to calibrate outcome and process rewards. Experiment results show that PPR achieves state-of-the-art performance across a wide range of benchmarks, demonstrating its impressive robustness and generalization. Our code and model collection is available in this link.
DBJul 14, 2025Code
SQLord: A Robust Enterprise Text-to-SQL Solution via Reverse Data Generation and Workflow DecompositionSong Cheng, Qiannan Cheng, Linbo Jin et al.
Transforming natural language into SQL queries (NL2SQL) is crucial for data-driven business applications. Existing frameworks, trained on open-source datasets, struggle with complex business logic and lack domain-specific data for fine-tuning. Additionally, evaluation methods often require annotated data and executable database environments, which are scarce in real-world scenarios. To address these challenges, we propose SQLord, an enterprise-level NL2SQL framework. First, SQLord introduces a data reverse generation approach to convert raw SQL statements into annotated data for supervised fine-tuning (SFT). Second, it proposes a decomposition method for complex queries using an automated workflow generator. Additionally, SQLord features a comprehensive GPT-Judge evaluation framework, including Execution Evaluation (EXE), Query-SQL Evaluation (QSE), and SQL-SQL Evaluation (SSE), tailored to diverse scenarios. Offline tests significantly outperform state of the art baselines, and online accuracy consistently exceeds 90, highlighting SQLord's advantages and effectiveness in complex real world scenarios. SQLord has been successfully applied across multiple scenarios on the world's largest B2B e-commerce platform.
DSMar 16
A Score Filter Enhanced Data Assimilation Framework for Data-Driven Dynamical SystemsJingqiao Tang, Ryan Bausback, Feng Bao et al.
We introduce a score-filter-enhanced data assimilation framework designed to reduce predictive uncertainty in machine learning (ML) models for data-driven dynamical system forecasting. Machine learning serves as an efficient numerical model for predicting dynamical systems. However, even with sufficient data, model uncertainty remains and accumulates over time, causing the long-term performance of ML models to deteriorate. To overcome this difficulty, we integrate data assimilation techniques into the training process to iteratively refine the model predictions by incorporating observational information. Specifically, we apply the Ensemble Score Filter (EnSF), a generative AI-based training-free diffusion model approach, for solving the data assimilation problem in high-dimensional nonlinear complex systems. This leads to a hybrid data assimilation-training framework that combines ML with EnSF to improve long-term predictive performance. We shall demonstrate that EnSF-enhanced ML can effectively reduce predictive uncertainty in ML-based Lorenz-96 system prediction and the Korteweg-De Vries (KdV) equation prediction.
LGMar 9, 2024
Towards Efficient Replay in Federated Incremental LearningYichen Li, Qunwei Li, Haozhao Wang et al.
In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain full data. We propose to employ a simple, generic framework for FIL named Re-Fed, which can coordinate each client to cache important samples for replay. More specifically, when a new task arrives, each client first caches selected previous samples based on their global and local importance. Then, the client trains the local model with both the cached samples and the samples from the new task. Theoretically, we analyze the ability of Re-Fed to discover important samples for replay thus alleviating the catastrophic forgetting problem. Moreover, we empirically show that Re-Fed achieves competitive performance compared to state-of-the-art methods.
CVApr 9
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal ModelsShilin Yan, Jintao Tong, Hongwei Xue et al.
The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey to blind tool invocation, resorting to reflexive tool execution even when queries are resolvable from the raw visual context. This pathological behavior precipitates severe latency bottlenecks and injects extraneous noise that derails sound reasoning. Existing reinforcement learning protocols attempt to mitigate this via a scalarized reward that penalizes tool usage. Yet, this coupled formulation creates an irreconcilable optimization dilemma: an aggressive penalty suppresses essential tool use, whereas a mild penalty is entirely subsumed by the variance of the accuracy reward during advantage normalization, rendering it impotent against tool overuse. To transcend this bottleneck, we propose HDPO, a framework that reframes tool efficiency from a competing scalar objective to a strictly conditional one. By eschewing reward scalarization, HDPO maintains two orthogonal optimization channels: an accuracy channel that maximizes task correctness, and an efficiency channel that enforces execution economy exclusively within accurate trajectories via conditional advantage estimation. This decoupled architecture naturally induces a cognitive curriculum-compelling the agent to first master task resolution before refining its self-reliance. Extensive evaluations demonstrate that our resulting model, Metis, reduces tool invocations by orders of magnitude while simultaneously elevating reasoning accuracy.
LGJan 31, 2024
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the ExpertsZhitian Xie, Yinger Zhang, Chenyi Zhuang et al.
The application of mixture-of-experts (MoE) is gaining popularity due to its ability to improve model's performance. In an MoE structure, the gate layer plays a significant role in distinguishing and routing input features to different experts. This enables each expert to specialize in processing their corresponding sub-tasks. However, the gate's routing mechanism also gives rise to narrow vision: the individual MoE's expert fails to use more samples in learning the allocated sub-task, which in turn limits the MoE to further improve its generalization ability. To effectively address this, we propose a method called Mixture-of-Distilled-Expert (MoDE), which applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts and gain more accurate perceptions on their original allocated sub-tasks. We conduct plenty experiments including tabular, NLP and CV datasets, which shows MoDE's effectiveness, universality and robustness. Furthermore, we develop a parallel study through innovatively constructing "expert probing", to experimentally prove why MoDE works: moderate distilling knowledge can improve each individual expert's test performances on their assigned tasks, leading to MoE's overall performance improvement.
CVMay 20, 2025
Unify Graph Learning with Text: Unleashing LLM Potentials for Session SearchSonghao Wu, Quan Tu, Hong Liu et al.
Session search involves a series of interactive queries and actions to fulfill user's complex information need. Current strategies typically prioritize sequential modeling for deep semantic understanding, overlooking the graph structure in interactions. While some approaches focus on capturing structural information, they use a generalized representation for documents, neglecting the word-level semantic modeling. In this paper, we propose Symbolic Graph Ranker (SGR), which aims to take advantage of both text-based and graph-based approaches by leveraging the power of recent Large Language Models (LLMs). Concretely, we first introduce a set of symbolic grammar rules to convert session graph into text. This allows integrating session history, interaction process, and task instruction seamlessly as inputs for the LLM. Moreover, given the natural discrepancy between LLMs pre-trained on textual corpora, and the symbolic language we produce using our graph-to-text grammar, our objective is to enhance LLMs' ability to capture graph structures within a textual format. To achieve this, we introduce a set of self-supervised symbolic learning tasks including link prediction, node content generation, and generative contrastive learning, to enable LLMs to capture the topological information from coarse-grained to fine-grained. Experiment results and comprehensive analysis on two benchmark datasets, AOL and Tiangong-ST, confirm the superiority of our approach. Our paradigm also offers a novel and effective methodology that bridges the gap between traditional search strategies and modern LLMs.
AIDec 8, 2023
Making Large Language Models Better Knowledge Miners for Online Marketing with Progressive Prompting AugmentationChunjing Gan, Dan Yang, Binbin Hu et al.
Nowadays, the rapid development of mobile economy has promoted the flourishing of online marketing campaigns, whose success greatly hinges on the efficient matching between user preferences and desired marketing campaigns where a well-established Marketing-oriented Knowledge Graph (dubbed as MoKG) could serve as the critical "bridge" for preference propagation. In this paper, we seek to carefully prompt a Large Language Model (LLM) with domain-level knowledge as a better marketing-oriented knowledge miner for marketing-oriented knowledge graph construction, which is however non-trivial, suffering from several inevitable issues in real-world marketing scenarios, i.e., uncontrollable relation generation of LLMs,insufficient prompting ability of a single prompt, the unaffordable deployment cost of LLMs. To this end, we propose PAIR, a novel Progressive prompting Augmented mIning fRamework for harvesting marketing-oriented knowledge graph with LLMs. In particular, we reduce the pure relation generation to an LLM based adaptive relation filtering process through the knowledge-empowered prompting technique. Next, we steer LLMs for entity expansion with progressive prompting augmentation,followed by a reliable aggregation with comprehensive consideration of both self-consistency and semantic relatedness. In terms of online serving, we specialize in a small and white-box PAIR (i.e.,LightPAIR),which is fine-tuned with a high-quality corpus provided by a strong teacher-LLM. Extensive experiments and practical applications in audience targeting verify the effectiveness of the proposed (Light)PAIR.
IRDec 15, 2023
GreenFlow: A Computation Allocation Framework for Building Environmentally Sound Recommendation SystemXingyu Lu, Zhining Liu, Yanchu Guan et al.
Given the enormous number of users and items, industrial cascade recommendation systems (RS) are continuously expanded in size and complexity to deliver relevant items, such as news, services, and commodities, to the appropriate users. In a real-world scenario with hundreds of thousands requests per second, significant computation is required to infer personalized results for each request, resulting in a massive energy consumption and carbon emission that raises concern. This paper proposes GreenFlow, a practical computation allocation framework for RS, that considers both accuracy and carbon emission during inference. For each stage (e.g., recall, pre-ranking, ranking, etc.) of a cascade RS, when a user triggers a request, we define two actions that determine the computation: (1) the trained instances of models with different computational complexity; and (2) the number of items to be inferred in the stage. We refer to the combinations of actions in all stages as action chains. A reward score is estimated for each action chain, followed by dynamic primal-dual optimization considering both the reward and computation budget. Extensive experiments verify the effectiveness of the framework, reducing computation consumption by 41% in an industrial mobile application while maintaining commercial revenue. Moreover, the proposed framework saves approximately 5000kWh of electricity and reduces 3 tons of carbon emissions per day.
CLFeb 12, 2024
Multi-Intent Attribute-Aware Text Matching in SearchingMingzhe Li, Xiuying Chen, Jing Xiang et al.
Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the location mentioned in the query, which represent condensed key information that is helpful for matching. However, most of the existing works downplay the effectiveness of attributes by integrating them into text representations as supplementary information. Hence, in this work, we focus on exploring the relationship between the attributes from two sides. Since attributes from two ends are often not aligned in terms of number and type, we propose to exploit the benefit of attributes by multiple-intent modeling. The intents extracted from attributes summarize the diverse needs of queries and provide rich content of items, which are more refined and abstract, and can be aligned for paired inputs. Concretely, we propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching. In the attribute-aware encoder, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes' importance. Afterward, the multi-intent modeling extracts intents from two ends and aligns them. Herein, we come up with a distribution loss to ensure the learned intents are diverse but concentrated, and a kullback-leibler divergence loss that aligns the learned intents. Finally, in the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result.
AO-PHJan 20, 2025
Ensemble score filter with image inpainting for data assimilation in tracking surface quasi-geostrophic dynamics with partial observationsSiming Liang, Hoang Tran, Feng Bao et al.
Data assimilation plays a pivotal role in understanding and predicting turbulent systems within geoscience and weather forecasting, where data assimilation is used to address three fundamental challenges, i.e., high-dimensionality, nonlinearity, and partial observations. Recent advances in machine learning (ML)-based data assimilation methods have demonstrated encouraging results. In this work, we develop an ensemble score filter (EnSF) that integrates image inpainting to solve the data assimilation problems with partial observations. The EnSF method exploits an exclusively designed training-free diffusion models to solve high-dimensional nonlinear data assimilation problems. Its performance has been successfully demonstrated in the context of having full observations, i.e., all the state variables are directly or indirectly observed. However, because the EnSF does not use a covariance matrix to capture the dependence between the observed and unobserved state variables, it is nontrivial to extend the original EnSF method to the partial observation scenario. In this work, we incorporate various image inpainting techniques into the EnSF to predict the unobserved states during data assimilation. At each filtering step, we first use the diffusion model to estimate the observed states by integrating the likelihood information into the score function. Then, we use image inpainting methods to predict the unobserved state variables. We demonstrate the performance of the EnSF with inpainting by tracking the Surface Quasi-Geostrophic (SQG) model dynamics under a variety of scenarios. The successful proof of concept paves the way to more in-depth investigations on exploiting modern image inpainting techniques to advance data assimilation methodology for practical geoscience and weather forecasting problems.
LGDec 10, 2024
MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task LearningYufei Ma, Zihan Liang, Huangyu Dai et al.
The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.
LGApr 2, 2025
Multi-fidelity Parameter Estimation Using Conditional Diffusion ModelsCaroline Tatsuoka, Minglei Yang, Dongbin Xiu et al.
We present a multi-fidelity method for uncertainty quantification of parameter estimates in complex systems, leveraging generative models trained to sample the target conditional distribution. In the Bayesian inference setting, traditional parameter estimation methods rely on repeated simulations of potentially expensive forward models to determine the posterior distribution of the parameter values, which may result in computationally intractable workflows. Furthermore, methods such as Markov Chain Monte Carlo (MCMC) necessitate rerunning the entire algorithm for each new data observation, further increasing the computational burden. Hence, we propose a novel method for efficiently obtaining posterior distributions of parameter estimates for high-fidelity models given data observations of interest. The method first constructs a low-fidelity, conditional generative model capable of amortized Bayesian inference and hence rapid posterior density approximation over a wide-range of data observations. When higher accuracy is needed for a specific data observation, the method employs adaptive refinement of the density approximation. It uses outputs from the low-fidelity generative model to refine the parameter sampling space, ensuring efficient use of the computationally expensive high-fidelity solver. Subsequently, a high-fidelity, unconditional generative model is trained to achieve greater accuracy in the target posterior distribution. Both low- and high- fidelity generative models enable efficient sampling from the target posterior and do not require repeated simulation of the high-fidelity forward model. We demonstrate the effectiveness of the proposed method on several numerical examples, including cases with multi-modal densities, as well as an application in plasma physics for a runaway electron simulation model.
LGOct 11, 2024
An End-to-End Deep Learning Method for Solving Nonlocal Allen-Cahn and Cahn-Hilliard Phase-Field ModelsYuwei Geng, Olena Burkovska, Lili Ju et al.
We propose an efficient end-to-end deep learning method for solving nonlocal Allen-Cahn (AC) and Cahn-Hilliard (CH) phase-field models. One motivation for this effort emanates from the fact that discretized partial differential equation-based AC or CH phase-field models result in diffuse interfaces between phases, with the only recourse for remediation is to severely refine the spatial grids in the vicinity of the true moving sharp interface whose width is determined by a grid-independent parameter that is substantially larger than the local grid size. In this work, we introduce non-mass conserving nonlocal AC or CH phase-field models with regular, logarithmic, or obstacle double-well potentials. Because of non-locality, some of these models feature totally sharp interfaces separating phases. The discretization of such models can lead to a transition between phases whose width is only a single grid cell wide. Another motivation is to use deep learning approaches to ameliorate the otherwise high cost of solving discretized nonlocal phase-field models. To this end, loss functions of the customized neural networks are defined using the residual of the fully discrete approximations of the AC or CH models, which results from applying a Fourier collocation method and a temporal semi-implicit approximation. To address the long-range interactions in the models, we tailor the architecture of the neural network by incorporating a nonlocal kernel as an input channel to the neural network model. We then provide the results of extensive computational experiments to illustrate the accuracy, structure-preserving properties, predictive capabilities, and cost reductions of the proposed method.
LGDec 15, 2023
Multiple Instance Learning for Uplift ModelingYao Zhao, Haipeng Zhang, Shiwei Lyu et al.
Uplift modeling is widely used in performance marketing to estimate effects of promotion campaigns (e.g., increase of customer retention rate). Since it is impossible to observe outcomes of a recipient in treatment (e.g., receiving a certain promotion) and control (e.g., without promotion) groups simultaneously (i.e., counter-factual), uplift models are mainly trained on instances of treatment and control groups separately to form two models respectively, and uplifts are predicted by the difference of predictions from these two models (i.e., two-model method). When responses are noisy and the treatment effect is fractional, induced individual uplift predictions will be inaccurate, resulting in targeting undesirable customers. Though it is impossible to obtain the ideal ground-truth individual uplifts, known as Individual Treatment Effects (ITEs), alternatively, an average uplift of a group of users, called Average Treatment Effect (ATE), can be observed from experimental deliveries. Upon this, similar to Multiple Instance Learning (MIL) in which each training sample is a bag of instances, our framework sums up individual user uplift predictions for each bag of users as its bag-wise ATE prediction, and regularizes it to its ATE label, thus learning more accurate individual uplifts. Additionally, to amplify the fractional treatment effect, bags are composed of instances with adjacent individual uplift predictions, instead of random instances. Experiments conducted on two datasets show the effectiveness and universality of the proposed framework.
LGMar 31, 2024
Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty PropagationMinglei Yang, Pengjun Wang, Ming Fan et al.
We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to auxiliary sampling methods for quantifying inverse uncertainty propagation. In this work, we develop the conditional pseudo-reversible normalizing flow model to directly learn and efficiently generate samples from the conditional probability density functions. The training process utilizes dataset consisting of input-output pairs without requiring prior knowledge about the noise and the function. Our model, once trained, can generate samples from any conditional probability density functions whose high probability regions are covered by the training set. Moreover, the pseudo-reversibility feature allows for the use of fully-connected neural network architectures, which simplifies the implementation and enables theoretical analysis. We provide a rigorous convergence analysis of the conditional pseudo-reversible normalizing flow model, showing its ability to converge to the target conditional probability density function using the Kullback-Leibler divergence. To demonstrate the effectiveness of our method, we apply it to several benchmark tests and a real-world geologic carbon storage problem.
COAug 9, 2025
A Score-based Diffusion Model Approach for Adaptive Learning of Stochastic Partial Differential Equation SolutionsToan Huynh, Ruth Lopez Fajardo, Guannan Zhang et al.
We propose a novel framework for adaptively learning the time-evolving solutions of stochastic partial differential equations (SPDEs) using score-based diffusion models within a recursive Bayesian inference setting. SPDEs play a central role in modeling complex physical systems under uncertainty, but their numerical solutions often suffer from model errors and reduced accuracy due to incomplete physical knowledge and environmental variability. To address these challenges, we encode the governing physics into the score function of a diffusion model using simulation data and incorporate observational information via a likelihood-based correction in a reverse-time stochastic differential equation. This enables adaptive learning through iterative refinement of the solution as new data becomes available. To improve computational efficiency in high-dimensional settings, we introduce the ensemble score filter, a training-free approximation of the score function designed for real-time inference. Numerical experiments on benchmark SPDEs demonstrate the accuracy and robustness of the proposed method under sparse and noisy observations.
MLJul 17, 2025
Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded DomainsMinglei Yang, Yanfang Liu, Diego del-Castillo-Negrete et al.
Simulating stochastic differential equations (SDEs) in bounded domains, presents significant computational challenges due to particle exit phenomena, which requires accurate modeling of interior stochastic dynamics and boundary interactions. Despite the success of machine learning-based methods in learning SDEs, existing learning methods are not applicable to SDEs in bounded domains because they cannot accurately capture the particle exit dynamics. We present a unified hybrid data-driven approach that combines a conditional diffusion model with an exit prediction neural network to capture both interior stochastic dynamics and boundary exit phenomena. Our ML model consists of two major components: a neural network that learns exit probabilities using binary cross-entropy loss with rigorous convergence guarantees, and a training-free diffusion model that generates state transitions for non-exiting particles using closed-form score functions. The two components are integrated through a probabilistic sampling algorithm that determines particle exit at each time step and generates appropriate state transitions. The performance of the proposed approach is demonstrated via three test cases: a one-dimensional simplified problem for theoretical verification, a two-dimensional advection-diffusion problem in a bounded domain, and a three-dimensional problem of interest to magnetically confined fusion plasmas.
MLApr 20, 2025
Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributionsHoang Tran, Zezhong Zhang, Feng Bao et al.
We propose a hybrid generative model for efficient sampling of high-dimensional, multimodal probability distributions for Bayesian inference. Traditional Monte Carlo methods, such as the Metropolis-Hastings and Langevin Monte Carlo sampling methods, are effective for sampling from single-mode distributions in high-dimensional spaces. However, these methods struggle to produce samples with the correct proportions for each mode in multimodal distributions, especially for distributions with well separated modes. To address the challenges posed by multimodality, we adopt a divide-and-conquer strategy. We start by minimizing the energy function with initial guesses uniformly distributed within the prior domain to identify all the modes of the energy function. Then, we train a classifier to segment the domain corresponding to each mode. After the domain decomposition, we train a diffusion-model-assisted generative model for each identified mode within its support. Once each mode is characterized, we employ bridge sampling to estimate the normalizing constant, allowing us to directly adjust the ratios between the modes. Our numerical examples demonstrate that the proposed framework can effectively handle multimodal distributions with varying mode shapes in up to 100 dimensions. An application to Bayesian inverse problem for partial differential equations is also provided.
LGDec 19, 2023
Improving the Expressive Power of Deep Neural Networks through Integral Activation TransformZezhong Zhang, Feng Bao, Guannan Zhang
The impressive expressive power of deep neural networks (DNNs) underlies their widespread applicability. However, while the theoretical capacity of deep architectures is high, the practical expressive power achieved through successful training often falls short. Building on the insights gained from Neural ODEs, which explore the depth of DNNs as a continuous variable, in this work, we generalize the traditional fully connected DNN through the concept of continuous width. In the Generalized Deep Neural Network (GDNN), the traditional notion of neurons in each layer is replaced by a continuous state function. Using the finite rank parameterization of the weight integral kernel, we establish that GDNN can be obtained by employing the Integral Activation Transform (IAT) as activation layers within the traditional DNN framework. The IAT maps the input vector to a function space using some basis functions, followed by nonlinear activation in the function space, and then extracts information through the integration with another collection of basis functions. A specific variant, IAT-ReLU, featuring the ReLU nonlinearity, serves as a smooth generalization of the scalar ReLU activation. Notably, IAT-ReLU exhibits a continuous activation pattern when continuous basis functions are employed, making it smooth and enhancing the trainability of the DNN. Our numerical experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of trainability and better smoothness.
MLSep 2, 2023
An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical SystemsFeng Bao, Zezhong Zhang, Guannan Zhang
We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, instead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses a mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves a tremendous amount of time spent on training neural networks. High-dimensional Lorenz-96 systems are used to demonstrate the performance of our method. EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method, in reliably and efficiently tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimensions) with highly nonlinear observation processes.
LGMay 30, 2023
Who Would be Interested in Services? An Entity Graph Learning System for User TargetingDan Yang, Binbin Hu, Xiaoyan Yang et al.
With the growing popularity of various mobile devices, user targeting has received a growing amount of attention, which aims at effectively and efficiently locating target users that are interested in specific services. Most pioneering works for user targeting tasks commonly perform similarity-based expansion with a few active users as seeds, suffering from the following major issues: the unavailability of seed users for newcoming services and the unfriendliness of black-box procedures towards marketers. In this paper, we design an Entity Graph Learning (EGL) system to provide explainable user targeting ability meanwhile applicable to addressing the cold-start issue. EGL System follows the hybrid online-offline architecture to satisfy the requirements of scalability and timeliness. Specifically, in the offline stage, the system focuses on the heavyweight entity graph construction and user entity preference learning, in which we propose a Three-stage Relation Mining Procedure (TRMP), breaking loose from the expensive seed users. At the online stage, the system offers the ability of user targeting in real-time based on the entity graph from the offline stage. Since the user targeting process is based on graph reasoning, the whole process is transparent and operation-friendly to marketers. Finally, extensive offline experiments and online A/B testing demonstrate the superior performance of the proposed EGL System.
COMP-PHFeb 18, 2022
Model Calibration of the Liquid Mercury Spallation Target using Evolutionary Neural Networks and Sparse Polynomial ExpansionsMajdi I. Radaideh, Hoang Tran, Lianshan Lin et al.
The mercury constitutive model predicting the strain and stress in the target vessel plays a central role in improving the lifetime prediction and future target designs of the mercury targets at the Spallation Neutron Source (SNS). We leverage the experiment strain data collected over multiple years to improve the mercury constitutive model through a combination of large-scale simulations of the target behavior and the use of machine learning tools for parameter estimation. We present two interdisciplinary approaches for surrogate-based model calibration of expensive simulations using evolutionary neural networks and sparse polynomial expansions. The experiments and results of the two methods show a very good agreement for the solid mechanics simulation of the mercury spallation target. The proposed methods are used to calibrate the tensile cutoff threshold, mercury density, and mercury speed of sound during intense proton pulse experiments. Using strain experimental data from the mercury target sensors, the newly calibrated simulations achieve 7\% average improvement on the signal prediction accuracy and 8\% reduction in mean absolute error compared to previously reported reference parameters, with some sensors experiencing up to 30\% improvement. The proposed calibrated simulations can significantly aid in fatigue analysis to estimate the mercury target lifetime and integrity, which reduces abrupt target failure and saves a tremendous amount of costs. However, an important conclusion from this work points out to a deficiency in the current constitutive model based on the equation of state in capturing the full physics of the spallation reaction. Given that some of the calibrated parameters that show a good agreement with the experimental data can be nonphysical mercury properties, we need a more advanced two-phase flow model to capture bubble dynamics and mercury cavitation.