CEJun 1
Aligning Shared and Routed Experts for Cross-Subject EEG GeneralizationZhi Zhang, Yan Liu, Zhejing Hu et al.
Cross-subject EEG generalization is challenging due to substantial heterogeneity across subjects. Existing methods typically learn either a shared subject-invariant model or multiple subject-specialized experts, but these two paradigms fail in complementary ways: the former may over-reduce subject-specific discriminative signals, while the latter may under-reduce transferable structure. We show that their suitability depends on the reducibility cost of branch-specific functions to branch-invariant ones, and we further provide a theory-to-method mapping that instantiates alignment principles in cross-subject EEG learning. Based on this insight, we propose Shared-Routed Expert Alignment (SREA), a collaborative framework that couples a shared expert for reducible invariant functions with routed experts for irreducible subject-specific functions. SREA trains the shared branch with joint embedding over augmented temporal neighbors, the routed branch with prototype-based sparse routing and expert specialization, and both branches with numerically stable mutual-guided reweighting based on cross-branch learnability gaps. Experiments on seven public EEG benchmarks across different tasks show that SREA consistently outperforms state-of-the-art methods and EEG foundation models.
CVMay 22Code
ChainFlow-VLA: Causal Flow Planning with Vision-Language ModelsXiyang Wang, Xinlin Wang, Tingguang Zhou et al.
Current end-to-end autonomous driving systems are fundamentally limited by a mismatch between temporal causal reasoning and global trajectory consistency. Autoregressive (AR) models capture interaction-aware temporal dependencies via causal factorization, but their step-wise decoding leads to error accumulation and suboptimal global structure. In contrast, diffusion models optimize trajectories globally but lack explicit causal constraints, making them unreliable in interactive and safety-critical scenarios. This dichotomy reveals a deeper issue: existing methods treat causal modeling and global optimization as separate paradigms, without a principled way to unify them within a single trajectory distribution. To address this, we propose ChainFlow-VLA, which unifies causal generation and global refinement within a unified probabilistic framework. We formulate planning as a mixture over AR-induced modes and learn Vision-Language Model (VLM)-conditioned residual distributions over these modes. An autoregressive generator (Chain) produces a discrete set of causal trajectory modes, followed by a diffusion-based refiner (Flow) that leverages VLM hidden states as semantic priors to perform mode-conditioned correction in residual space while preserving causal structure. This straightforward conditioning seamlessly injects high-level scene understanding into fine-grained trajectory adjustments. Experiments demonstrate that ChainFlow-VLA achieves robust planning in ambiguous and long-tail scenarios, achieving a state-of-the-art score of 94.85 on the NAVSIM v1 leaderboard, matching human-level performance (94.8). Code will be available at https://github.com/AFARI-Research/ChainFlow-VLA.
CVApr 22, 2023
SSN: Stockwell Scattering Network for SAR Image Change DetectionGong Chen, Yanan Zhao, Yi Wang et al.
Recently, synthetic aperture radar (SAR) image change detection has become an interesting yet challenging direction due to the presence of speckle noise. Although both traditional and modern learning-driven methods attempted to overcome this challenge, deep convolutional neural networks (DCNNs)-based methods are still hindered by the lack of interpretability and the requirement of large computation power. To overcome this drawback, wavelet scattering network (WSN) and Fourier scattering network (FSN) are proposed. Combining respective merits of WSN and FSN, we propose Stockwell scattering network (SSN) based on Stockwell transform which is widely applied against noisy signals and shows advantageous characteristics in speckle reduction. The proposed SSN provides noise-resilient feature representation and obtains state-of-art performance in SAR image change detection as well as high computational efficiency. Experimental results on three real SAR image datasets demonstrate the effectiveness of the proposed method.
CVAug 19, 2024Code
From Darkness to Detail: Frequency-Aware SSMs for Low-Light VisionEashan Adhikarla, Kai Zhang, Gong Chen et al.
Low-light image enhancement remains a persistent challenge in computer vision, where state-of-the-art models are often hampered by hardware constraints and computational inefficiency, particularly at high resolutions. While foundational architectures like transformers and diffusion models have advanced the field, their computational complexity limits their deployment on edge devices. We introduce ExpoMamba, a novel architecture that integrates a frequency-aware state-space model within a modified U-Net. ExpoMamba is designed to address mixed-exposure challenges by decoupling the modeling of amplitude (intensity) and phase (structure) in the frequency domain. This allows for targeted enhancement, making it highly effective for real-time applications, including downstream tasks like object detection and segmentation. Our experiments on six benchmark datasets show that ExpoMamba is up to 2-3x faster than competing models and achieves a 6.8\% PSNR improvement, establishing a new state-of-the-art in efficient, high-quality low-light enhancement. Source code: https://www.github.com/eashanadhikarla/ExpoMamba.
AIFeb 5Code
TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement LearningZihao Jiang, Miao Peng, Zhenyan Shan et al.
Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.
SEApr 15
WebMAC: A Multi-Agent Collaborative Framework for Scenario Testing of Web SystemsZhenyu Wan, Gong Chen, Qing Huang et al.
Scenario testing is an important technique for detecting errors in web systems. Testers draft test scenarios and convert them into test scripts for execution. Early methods relied on testers to convert test scenarios into test scripts. Recent LLM-based scenario testing methods can generate test scripts from natural language descriptions of test scenarios. However, these methods are not only limited by the incompleteness of descriptions but also overlook test adequacy criteria, making it difficult to detect potential errors. To address these limitations, this paper proposes WebMAC, a multi-agent collaborative framework for scenario testing of web systems. WebMAC can complete natural language descriptions of test scenarios through interactive clarification and transform adequate instantiated test scenarios via equivalence class partitioning. WebMAC consists of three multi-agent modules, responsible respectively for completing natural language descriptions of test scenarios, transforming test scenarios, and converting test scripts. We evaluated WebMAC on four web systems. Compared with the SOTA method, WebMAC improves the execution success rate of generated test scripts by 30%-60%, increases testing efficiency by 29%, and reduces token consumption by 47.6%. Furthermore, WebMAC can effectively detect more errors in web systems.
LGJun 13, 2025Code
RollingQ: Reviving the Cooperation Dynamics in Multimodal TransformerHaotian Ni, Yake Wei, Hang Liu et al.
Multimodal learning faces challenges in effectively fusing information from diverse modalities, especially when modality quality varies across samples. Dynamic fusion strategies, such as attention mechanism in Transformers, aim to address such challenge by adaptively emphasizing modalities based on the characteristics of input data. However, through amounts of carefully designed experiments, we surprisingly observed that the dynamic adaptability of widely-used self-attention models diminishes. Model tends to prefer one modality regardless of data characteristics. This bias triggers a self-reinforcing cycle that progressively overemphasizes the favored modality, widening the distribution gap in attention keys across modalities and deactivating attention mechanism's dynamic properties. To revive adaptability, we propose a simple yet effective method Rolling Query (RollingQ), which balances attention allocation by rotating the query to break the self-reinforcing cycle and mitigate the key distribution gap. Extensive experiments on various multimodal scenarios validate the effectiveness of RollingQ and the restoration of cooperation dynamics is pivotal for enhancing the broader capabilities of widely deployed multimodal Transformers. The source code is available at https://github.com/GeWu-Lab/RollingQ_ICML2025.
CVMar 2
CoopDiff: A Diffusion-Guided Approach for Cooperation under CorruptionsGong Chen, Chaokun Zhang, Pengcheng Lv
Cooperative perception lets agents share information to expand coverage and improve scene understanding. However, in real-world scenarios, diverse and unpredictable corruptions undermine its robustness and generalization. To address these challenges, we introduce CoopDiff, a diffusion-based cooperative perception framework that mitigates corruptions via a denoising mechanism. CoopDiff adopts a teacher-student paradigm: the Quality-Aware Teacher performs voxel-level early fusion with Quality of Interest weighting and semantic guidance, then produces clean supervision features via a diffusion denoiser. The Dual-Branch Diffusion Student first separates ego and cooperative streams in encoding to reconstruct the teacher's clean targets. And then, an Ego-Guided Cross-Attention mechanism facilitates balanced decoding under degradation by adaptively integrating ego and cooperative features. We evaluate CoopDiff on two constructed multi-degradation benchmarks, OPV2Vn and DAIR-V2Xn, each incorporating six corruption types, including environmental and sensor-level distortions. Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error. Furthermore, it offers a tunable balance between precision and inference efficiency.
DCJan 5
EPD-Serve: A Flexible Multimodal EPD Disaggregation Inference Serving System On AscendFan Bai, Pai Peng, Zhengzhi Tang et al.
With the widespread adoption of large multimodal models, efficient inference across text, image, audio, and video modalities has become critical. However, existing multimodal inference systems typically employ monolithic architectures that tightly couple the Encode, Prefill, and Decode stages on homogeneous hardware, neglecting the heterogeneous computational characteristics of each stage. This design leads to inefficient resource utilization and limited system throughput. To address these issues, we propose EPD-Serve, a stage-level disaggregated inference serving system for multimodal models. EPD-Serve decouples the inference pipeline into independent Encode, Prefill, and Decode stages, enabling logical isolation and flexible co-located deployment through dynamic orchestration. Leveraging the Ascend interconnect topology, EPD-Serve introduces asynchronous feature prefetching between Encode and Prefill stages and a hierarchical grouped KV cache transmission mechanism between Prefill and Decode stages to improve cross-node communication efficiency. In addition, EPD-Serve incorporates multi-route scheduling, instance-level load balancing, and multi-stage hardware co-location with spatial multiplexing to better support diverse multimodal workloads. Comprehensive experiments on multimodal understanding models demonstrate that, under high-concurrency scenarios, EPD-Serve improves end-to-end throughput by 57.37-69.48% compared to PD-disaggregated deployment, while satisfying strict SLO constraints, including TTFT below 2000 ms and TPOT below 50 ms. These results highlight the effectiveness of stage-level disaggregation for optimizing multimodal large model inference systems.
CVDec 15, 2025
CoRA: A Collaborative Robust Architecture with Hybrid Fusion for Efficient PerceptionGong Chen, Chaokun Zhang, Pengcheng Lv et al.
Collaborative perception has garnered significant attention as a crucial technology to overcome the perceptual limitations of single-agent systems. Many state-of-the-art (SOTA) methods have achieved communication efficiency and high performance via intermediate fusion. However, they share a critical vulnerability: their performance degrades under adverse communication conditions due to the misalignment induced by data transmission, which severely hampers their practical deployment. To bridge this gap, we re-examine different fusion paradigms, and recover that the strengths of intermediate and late fusion are not a trade-off, but a complementary pairing. Based on this key insight, we propose CoRA, a novel collaborative robust architecture with a hybrid approach to decouple performance from robustness with low communication. It is composed of two components: a feature-level fusion branch and an object-level correction branch. Its first branch selects critical features and fuses them efficiently to ensure both performance and scalability. The second branch leverages semantic relevance to correct spatial displacements, guaranteeing resilience against pose errors. Experiments demonstrate the superiority of CoRA. Under extreme scenarios, CoRA improves upon its baseline performance by approximately 19% in AP@0.7 with more than 5x less communication volume, which makes it a promising solution for robust collaborative perception.
CVOct 15, 2025
NTIRE 2025 Challenge on Low Light Image Enhancement: Methods and ResultsXiaoning Liu, Zongwei Wu, Florin-Alexandru Vasluianu et al.
This paper presents a comprehensive review of the NTIRE 2025 Low-Light Image Enhancement (LLIE) Challenge, highlighting the proposed solutions and final outcomes. The objective of the challenge is to identify effective networks capable of producing brighter, clearer, and visually compelling images under diverse and challenging conditions. A remarkable total of 762 participants registered for the competition, with 28 teams ultimately submitting valid entries. This paper thoroughly evaluates the state-of-the-art advancements in LLIE, showcasing the significant progress.
CVMar 2
WhisperNet: A Scalable Solution for Bandwidth-Efficient CollaborationGong Chen, Chaokun Zhang, Xinyan Zhao
Collaborative perception is vital for autonomous driving yet remains constrained by tight communication budgets. Earlier work reduced bandwidth by compressing full feature maps with fixed-rate encoders, which adapts poorly to a changing environment, and it further evolved into spatial selection methods that improve efficiency by focusing on salient regions, but this object-centric approach often sacrifices global context, weakening holistic scene understanding. To overcome these limitations, we introduce \textit{WhisperNet}, a bandwidth-aware framework that proposes a novel, receiver-centric paradigm for global coordination across agents. Senders generate lightweight saliency metadata, while the receiver formulates a global request plan that dynamically budgets feature contributions across agents and features, retrieving only the most informative features. A collaborative feature routing module then aligns related messages before fusion to ensure structural consistency. Extensive experiments show that WhisperNet achieves state-of-the-art performance, improving AP@0.7 on OPV2V by 2.4\% with only 0.5\% of the communication cost. As a plug-and-play component, it boosts strong baselines with merely 5\% of full bandwidth while maintaining robustness under localization noise. These results demonstrate that globally-coordinated allocation across \textit{what} and \textit{where} to share is the key to achieving efficient collaborative perception.
ROFeb 9, 2024
LLMs for Coding and Robotics EducationPeng Shu, Huaqin Zhao, Hanqi Jiang et al.
Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code explanation, generation, and modification. In this paper, we highlight an important trend in robot coding education. We test several mainstream large language models on both traditional coding tasks and the more challenging task of robot code generation, which includes block diagrams. Our results show that GPT-4V outperforms other models in all of our tests but struggles with generating block diagram images.
AIApr 2
Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under CorruptionsPengcheng Lyu, Chaokun Zhang, Gong Chen et al.
Multi-agent collaborative perception enables autonomous systems to overcome individual sensing limits through collective intelligence. However, real-world sensor and communication corruptions severely undermine this advantage. Crucially, existing approaches treat corruptions as static perturbations or passively conform to corrupted inputs, failing to actively recover the underlying clean semantics. To address this limitation, we introduce Diff-KD, a framework that integrates diffusion-based generative refinement into teacher-student knowledge distillation for robust collaborative perception. Diff-KD features two core components: (i) Progressive Knowledge Distillation (PKD), which treats local feature restoration as a conditional diffusion process to recover global semantics from corrupted observations; and (ii) Adaptive Gated Fusion (AGF), which dynamically weights neighbors based on ego reliability during fusion. Evaluated on OPV2V and DAIR-V2X under seven corruption types, Diff-KD achieves state-of-the-art performance in both detection accuracy and calibration robustness.
CVMar 25, 2025
LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark DatasetManjushree Aithal, Rosaura G. VidalMata, Manikandtan Kartha et al.
Low-light image enhancement is crucial for a myriad of applications, from night vision and surveillance, to autonomous driving. However, due to the inherent limitations that come in hand with capturing images in low-illumination environments, the task of enhancing such scenes still presents a formidable challenge. To advance research in this field, we introduce our Low Exposure Night Vision (LENVIZ) Dataset, a comprehensive multi-exposure benchmark dataset for low-light image enhancement comprising of over 230K frames showcasing 24K real-world indoor and outdoor, with-and without human, scenes. Captured using 3 different camera sensors, LENVIZ offers a wide range of lighting conditions, noise levels, and scene complexities, making it the largest publicly available up-to 4K resolution benchmark in the field. LENVIZ includes high quality human-generated ground truth, for which each multi-exposure low-light scene has been meticulously curated and edited by expert photographers to ensure optimal image quality. Furthermore, we also conduct a comprehensive analysis of current state-of-the-art low-light image enhancement techniques on our dataset and highlight potential areas of improvement.
CVMar 5
CATNet: Collaborative Alignment and Transformation Network for Cooperative PerceptionGong Chen, Chaokun Zhang, Tao Tang et al.
Cooperative perception significantly enhances scene understanding by integrating complementary information from diverse agents. However, existing research often overlooks critical challenges inherent in real-world multi-source data integration, specifically high temporal latency and multi-source noise. To address these practical limitations, we propose Collaborative Alignment and Transformation Network (CATNet), an adaptive compensation framework that resolves temporal latency and noise interference in multi-agent systems. Our key innovations can be summarized in three aspects. First, we introduce a Spatio-Temporal Recurrent Synchronization (STSync) that aligns asynchronous feature streams via adjacent-frame differential modeling, establishing a temporal-spatially unified representation space. Second, we design a Dual-Branch Wavelet Enhanced Denoiser (WTDen) that suppresses global noise and reconstructs localized feature distortions within aligned representations. Third, we construct an Adaptive Feature Selector (AdpSel) that dynamically focuses on critical perceptual features for robust fusion. Extensive experiments on multiple datasets demonstrate that CATNet consistently outperforms existing methods under complex traffic conditions, proving its superior robustness and adaptability.
AIOct 8, 2025
Evolving and Executing Research Plans via Double-Loop Multi-Agent CollaborationZhi Zhang, Yan Liu, Zhejing Hu et al.
Automating the end-to-end scientific research process poses a fundamental challenge: it requires both evolving high-level plans that are novel and sound, and executing these plans correctly amidst dynamic and uncertain conditions. To address this bilevel challenge, we propose a novel Double-Loop Multi-Agent (DLMA) framework to solve the given research problem automatically. The leader loop, composed of professor agents, is responsible for evolving research plans. It employs an evolutionary algorithm through involvement, improvement, and integration meetings to iteratively generate and refine a pool of research proposals, exploring the solution space effectively. The follower loop, composed of doctoral student agents, is responsible for executing the best-evolved plan. It dynamically adjusts the plan during implementation via pre-hoc and post-hoc meetings, ensuring each step (e.g., drafting, coding) is well-supported by contextual and external observations. Extensive experiments on benchmarks like ACLAward and Laboratory show that DLMA generates research papers that achieve state-of-the-art scores in automated evaluation, significantly outperforming strong baselines. Ablation studies confirm the critical roles of both loops, with evolution driving novelty and execution ensuring soundness.
SDAug 27, 2025
CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music GenerationZhejing Hu, Yan Liu, Gong Chen et al.
Generative artificial intelligence in music has made significant strides, yet it still falls short of the substantial achievements seen in natural language processing, primarily due to the limited availability of music data. Knowledge-informed approaches have been shown to enhance the performance of music generation models, even when only a few pieces of musical knowledge are integrated. This paper seeks to leverage comprehensive music theory in AI-driven music generation tasks, such as algorithmic composition and style transfer, which traditionally require significant manual effort with existing techniques. We introduce a novel automatic music lexicon construction model that generates a lexicon, named CompLex, comprising 37,432 items derived from just 9 manually input category keywords and 5 sentence prompt templates. A new multi-agent algorithm is proposed to automatically detect and mitigate hallucinations. CompLex demonstrates impressive performance improvements across three state-of-the-art text-to-music generation models, encompassing both symbolic and audio-based methods. Furthermore, we evaluate CompLex in terms of completeness, accuracy, non-redundancy, and executability, confirming that it possesses the key characteristics of an effective lexicon.
LGAug 15, 2025
PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG DecodingChanghong Jing, Yan Liu, Shuqiang Wang et al.
Cross-subject electroencephalography (EEG) decoding remains a fundamental challenge in brain-computer interface (BCI) research due to substantial inter-subject variability and the scarcity of subject-invariant representations. This paper proposed PTSM (Physiology-aware and Task-invariant Spatio-temporal Modeling), a novel framework for interpretable and robust EEG decoding across unseen subjects. PTSM employs a dual-branch masking mechanism that independently learns personalized and shared spatio-temporal patterns, enabling the model to preserve individual-specific neural characteristics while extracting task-relevant, population-shared features. The masks are factorized across temporal and spatial dimensions, allowing fine-grained modulation of dynamic EEG patterns with low computational overhead. To further address representational entanglement, PTSM enforces information-theoretic constraints that decompose latent embeddings into orthogonal task-related and subject-related subspaces. The model is trained end-to-end via a multi-objective loss integrating classification, contrastive, and disentanglement objectives. Extensive experiments on cross-subject motor imagery datasets demonstrate that PTSM achieves strong zero-shot generalization, outperforming state-of-the-art baselines without subject-specific calibration. Results highlight the efficacy of disentangled neural representations for achieving both personalized and transferable decoding in non-stationary neurophysiological settings.
LGMay 21, 2025
Mini-Game Lifetime Value Prediction in WeChatAochuan Chen, Yifan Niu, Ziqi Gao et al.
The LifeTime Value (LTV) prediction, which endeavors to forecast the cumulative purchase contribution of a user to a particular item, remains a vital challenge that advertisers are keen to resolve. A precise LTV prediction system enhances the alignment of user interests with meticulously designed advertisements, thereby generating substantial profits for advertisers. Nonetheless, this issue is complicated by the paucity of data typically observed in real-world advertising scenarios. The purchase rate among registered users is often as critically low as 0.1%, resulting in a dataset where the majority of users make only several purchases. Consequently, there is insufficient supervisory signal for effectively training the LTV prediction model. An additional challenge emerges from the interdependencies among tasks with high correlation. It is a common practice to estimate a user's contribution to a game over a specified temporal interval. Varying the lengths of these intervals corresponds to distinct predictive tasks, which are highly correlated. For instance, predictions over a 7-day period are heavily reliant on forecasts made over a 3-day period, where exceptional cases can adversely affect the accuracy of both tasks. In order to comprehensively address the aforementioned challenges, we introduce an innovative framework denoted as Graph-Represented Pareto-Optimal LifeTime Value prediction (GRePO-LTV). Graph representation learning is initially employed to address the issue of data scarcity. Subsequently, Pareto-Optimization is utilized to manage the interdependence of prediction tasks.
CVMar 3, 2025
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial PurificationGaozheng Pei, Shaojie Lyu, Gong Chen et al.
Existing diffusion-based purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. However, this approach is fundamentally flawed: the uniform operation of the forward process across all pixels compromises normal pixels while attempting to combat adversarial perturbations, resulting in the target model producing incorrect predictions. Simply relying on low-intensity noise is insufficient for effective defense. To address this critical issue, we implement a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise. This requirement motivates us to redesign the sampling process of the diffusion model, allowing for the effective removal of varying noise levels. Furthermore, to evaluate our method against strong adaptative attack, our proposed method sharply reduces time cost and memory usage through a single-step resampling. The empirical evidence from extensive experiments across three datasets demonstrates that our method outperforms most current adversarial training and purification techniques by a substantial margin.
CLNov 9, 2024
Mixture of Knowledge Minigraph Agents for Literature Review GenerationZhi Zhang, Yan Liu, Sheng-hua Zhong et al.
Literature reviews play a crucial role in scientific research for understanding the current state of research, identifying gaps, and guiding future studies on specific topics. However, the process of conducting a comprehensive literature review is yet time-consuming. This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs), to automate scholarly literature reviews. A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs. By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs. We evaluate CKMAs on three benchmark datasets. Experimental results show the effectiveness of the proposed method, further revealing promising applications of LLMs in scientific research.
SDJan 21, 2022
Can Machines Generate Personalized Music? A Hybrid Favorite-aware Method for User Preference Music TransferZhejing Hu, Yan Liu, Gong Chen et al.
User preference music transfer (UPMT) is a new problem in music style transfer that can be applied to many scenarios but remains understudied.