ARMay 30Code
MACO: A Multi-Agent LLM Framework for Automated CGRA Hardware/Software Co-DesignZesong Jiang, Yuqi Sun, Qing Zhong et al.
Designing optimal Coarse-Grained Reconfigurable Arrays (CGRAs) requires navigating a vast, interdependent hardware/software space bottlenecked by costly manual iteration. We present MACO, an open-source, multi-agent LLM framework that automates CGRA HW/SW co-design. MACO decomposes the design loop into four collaborative stages, HW/SW Co-design, Error Correction, Best-Design Selection, and Evaluation & Feedback, to iteratively optimize power, performance, and area (PPA). To accelerate convergence and efficiently traverse the design space, MACO introduces an exponentially decaying exploration strategy, EDA-guided LLM self-learning, and robust rule-based error correction. Evaluated against state-of-the-art baselines, MACO reduces power consumption by 25.9%, improves performance by 20.0%, and accelerates the search process by 5x. Finally, we validate MACO's physical design through a complete 7nm ASIC design flow.
CVJul 18, 2022
Geometry-Aware Reference Synthesis for Multi-View Image Super-ResolutionRi Cheng, Yuqi Sun, Bo Yan et al.
Recent multi-view multimedia applications struggle between high-resolution (HR) visual experience and storage or bandwidth constraints. Therefore, this paper proposes a Multi-View Image Super-Resolution (MVISR) task. It aims to increase the resolution of multi-view images captured from the same scene. One solution is to apply image or video super-resolution (SR) methods to reconstruct HR results from the low-resolution (LR) input view. However, these methods cannot handle large-angle transformations between views and leverage information in all multi-view images. To address these problems, we propose the MVSRnet, which uses geometry information to extract sharp details from all LR multi-view to support the SR of the LR input view. Specifically, the proposed Geometry-Aware Reference Synthesis module in MVSRnet uses geometry information and all multi-view LR images to synthesize pixel-aligned HR reference images. Then, the proposed Dynamic High-Frequency Search network fully exploits the high-frequency textural details in reference images for SR. Extensive experiments on several benchmarks show that our method significantly improves over the state-of-the-art approaches.
CVJun 19, 2023
Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with InstructionsYuqi Sun, Ruian He, Weimin Tan et al.
Recent neural talking radiance field methods have shown great success in photorealistic audio-driven talking face synthesis. In this paper, we propose a novel interactive framework that utilizes human instructions to edit such implicit neural representations to achieve real-time personalized talking face generation. Given a short speech video, we first build an efficient talking radiance field, and then apply the latest conditional diffusion model for image editing based on the given instructions and guiding implicit representation optimization towards the editing target. To ensure audio-lip synchronization during the editing process, we propose an iterative dataset updating strategy and utilize a lip-edge loss to constrain changes in the lip region. We also introduce a lightweight refinement network for complementing image details and achieving controllable detail generation in the final rendered image. Our method also enables real-time rendering at up to 30FPS on consumer hardware. Multiple metrics and user verification show that our approach provides a significant improvement in rendering quality compared to state-of-the-art methods.
AINov 4, 2025
Evaluating Control Protocols for Untrusted AI AgentsJon Kutasov, Chloe Loughridge, Yuqi Sun et al.
As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to mitigating the risk from untrusted AI agents by monitoring their actions and intervening or auditing when necessary. Evaluating the safety of these protocols requires understanding both their effectiveness against current attacks and their robustness to adaptive adversaries. In this work, we systematically evaluate a range of control protocols in SHADE-Arena, a dataset of diverse agentic environments. First, we evaluate blue team protocols, including deferral to trusted models, resampling, and deferring on critical actions, against a default attack policy. We find that resampling for incrimination and deferring on critical actions perform best, increasing safety from 50% to 96%. We then iterate on red team strategies against these protocols and find that attack policies with additional affordances, such as knowledge of when resampling occurs or the ability to simulate monitors, can substantially improve attack success rates against our resampling strategy, decreasing safety to 17%. However, deferring on critical actions is highly robust to even our strongest red team strategies, demonstrating the importance of denying attack policies access to protocol internals.
LGAug 13, 2024
A Self-Supervised Paradigm for Data-Efficient Medical Foundation Model Pre-training: V-information Optimization FrameworkWenxuan Yang, Hanyu Zhang, Weimin Tan et al.
Self-supervised pre-training medical foundation models on large-scale datasets demonstrate exceptional performance. Recent research challenges this common paradigm by introducing data-effective learning approaches, demonstrating that merely increasing pre-training data volume does not necessarily improve model performance. However, current methods still have unclear standards and the underlying theoretical foundation remains unknown. In this paper, as the first attempt to address this limitation, we introduce V-information into self-supervised pre-training of foundation models to provide a theoretical foundation for sample selection. Our derivation confirms that by optimizing V-information, sample selection can be framed as an optimization problem where choosing diverse and challenging samples enhances model performance even under limited training data. Under this guidance, we develop an optimized data-effective learning method (OptiDEL) to optimize V-information in real-world medical domains by generating more diverse and harder samples. We compare the OptiDEL method with state-of-the-art approaches finding that OptiDEL consistently outperforms existing approaches across eight different datasets, with foundation models trained on only 5% of the pre-training data achieving up to 6.2% higher mIoU than those trained on the full dataset. Remarkably, OptiDEL demonstrates an average improvement of 4.7% mIoU over competing methods while using 20x less training data.
CVMay 9, 2025Code
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from TextbooksWenqi Zeng, Yuqi Sun, Chenxi Ma et al.
Medical vision-language models (VLMs) have shown promise as clinical assistants across various medical fields. However, specialized dermatology VLM capable of delivering professional and detailed diagnostic analysis remains underdeveloped, primarily due to less specialized text descriptions in current dermatology multimodal datasets. To address this issue, we propose MM-Skin, the first large-scale multimodal dermatology dataset that encompasses 3 imaging modalities, including clinical, dermoscopic, and pathological and nearly 10k high-quality image-text pairs collected from professional textbooks. In addition, we generate over 27k diverse, instruction-following vision question answering (VQA) samples (9 times the size of current largest dermatology VQA dataset). Leveraging public datasets and MM-Skin, we developed SkinVL, a dermatology-specific VLM designed for precise and nuanced skin disease interpretation. Comprehensive benchmark evaluations of SkinVL on VQA, supervised fine-tuning (SFT) and zero-shot classification tasks across 8 datasets, reveal its exceptional performance for skin diseases in comparison to both general and medical VLM models. The introduction of MM-Skin and SkinVL offers a meaningful contribution to advancing the development of clinical dermatology VLM assistants. MM-Skin is available at https://github.com/ZwQ803/MM-Skin
LGFeb 5
Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement LearningJohn Yan, Michael Yu, Yuqi Sun et al.
Large language models (LLMs) are increasingly trained in complex Reinforcement Learning, multi-agent environments, making it difficult to understand how behavior changes over training. Sparse Autoencoders (SAEs) have recently shown to be useful for data-centric interpretability. In this work, we analyze large-scale reinforcement learning training runs from the sophisticated environment of Full-Press Diplomacy by applying pretrained SAEs, alongside LLM-summarizer methods. We introduce Meta-Autointerp, a method for grouping SAE features into interpretable hypotheses about training dynamics. We discover fine-grained behaviors including role-playing patterns, degenerate outputs, language switching, alongside high-level strategic behaviors and environment-specific bugs. Through automated evaluation, we validate that 90% of discovered SAE Meta-Features are significant, and find a surprising reward hacking behavior. However, through two user studies, we find that even subjectively interesting and seemingly helpful SAE features may be worse than useless to humans, along with most LLM generated hypotheses. However, a subset of SAE-derived hypotheses are predictively useful for downstream tasks. We further provide validation by augmenting an untrained agent's system prompt, improving the score by +14.2%. Overall, we show that SAEs and LLM-summarizer provide complementary views into agent behavior, and together our framework forms a practical starting point for future data-centric interpretability work on ensuring trustworthy LLM behavior throughout training.
AIJun 17, 2025
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM AgentsJonathan Kutasov, Yuqi Sun, Paul Colognese et al.
As Large Language Models (LLMs) are increasingly deployed as autonomous agents in complex and long horizon settings, it is critical to evaluate their ability to sabotage users by pursuing hidden objectives. We study the ability of frontier LLMs to evade monitoring and achieve harmful hidden goals while completing a wide array of realistic tasks. We evaluate a broad range of frontier LLMs using SHADE (Subtle Harmful Agent Detection & Evaluation)-Arena, the first highly diverse agent evaluation dataset for sabotage and monitoring capabilities of LLM agents. SHADE-Arena consists of complex pairs of benign main tasks and harmful side objectives in complicated environments. Agents are evaluated on their ability to complete the side task without appearing suspicious to an LLM monitor. When measuring agent ability to (a) complete the main task, (b) complete the side task, and (c) avoid detection, we find that the best performing frontier models score 27% (Claude 3.7 Sonnet) and 15% (Gemini 2.5 Pro) as sabotage agents when overseen by Claude 3.6 Sonnet. For current frontier models, success on the side task relies heavily on having access to a hidden scratchpad that is not visible to the monitor. We also use SHADE-Arena to measure models' monitoring abilities, with the top monitor (Gemini 2.5 Pro) achieving an AUC of 0.87 at distinguishing benign and malign transcripts. We find that for now, models still struggle at sabotage due to failures in long-context main task execution. However, our measurements already demonstrate the difficulty of monitoring for subtle sabotage attempts, which we expect to only increase in the face of more complex and longer-horizon tasks.
AIApr 24
Don't Make the LLM Read the Graph: Make the Graph ThinkYuqi Sun, Tianqin Meng, George Liu et al.
We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanabi, we establish four findings. First, integration architecture determines whether belief graphs provide value: as prompt context, graphs are decorative for strong models and beneficial only for weak models on 2nd-order Theory of Mind (80% vs 10%, p<0.0001, OR=36.0); when graphs gate action selection through ranked shortlists, they become structurally essential even for strong models (100% vs 20% on 2nd-order ToM, p<0.001). Second, we identify "Planner Defiance," a model-family-specific failure where LLMs override correct planner recommendations at partial competence (90% override, replicated N=20); Gemini models show near-zero defiance while Llama 70B shows 90%, and models distinguish factual context (deferred to) from advisory recommendations (overridden). Third, full-game evidence confirms inter-agent conventions (+128% over baseline, p=0.003) outperform all single-agent interventions, and individual belief-graph components must be combined to produce gains. Fourth, preliminary scaling analysis (N=10/cell, exploratory) suggests graph depth has diminishing returns: shallow graphs provide the best cost-benefit ratio, while deeper ToM graphs appear harmful at larger player counts (-1.5 pts at 5-player, p=0.029).
LGDec 20, 2024
Graph Structure Refinement with Energy-based Contrastive LearningXianlin Zeng, Yufeng Wang, Yuqi Sun et al.
Graph Neural Networks (GNNs) have recently gained widespread attention as a successful tool for analyzing graph-structured data. However, imperfect graph structure with noisy links lacks enough robustness and may damage graph representations, therefore limiting the GNNs' performance in practical tasks. Moreover, existing generative architectures fail to fit discriminative graph-related tasks. To tackle these issues, we introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation, aiming to improve the discriminative performance of generative models. We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR. To our knowledge, this is the first work to combine energy-based models with contrastive learning for GSR. Specifically, we leverage ECL to approximate the joint distribution of sample pairs, which increases the similarity between representations of positive pairs while reducing the similarity between negative ones. Refined structure is produced by augmenting and removing edges according to the similarity metrics among node representations. Extensive experiments demonstrate that ECL-GSR outperforms the state-of-the-art on eight benchmark datasets in node classification. ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.
IRDec 18, 2024
Large Language Model Enhanced Recommender Systems: A SurveyQidong Liu, Xiangyu Zhao, Yuhao Wang et al.
Large Language Model (LLM) has transformative potential in various domains, including recommender systems (RS). There have been a handful of research that focuses on empowering the RS by LLM. However, previous efforts mainly focus on LLM as RS, which may face the challenge of intolerant inference costs by LLM. Recently, the integration of LLM into RS, known as LLM-Enhanced Recommender Systems (LLMERS), has garnered significant interest due to its potential to address latency and memory constraints in real-world applications. This paper presents a comprehensive survey of the latest research efforts aimed at leveraging LLM to enhance RS capabilities. We identify a critical shift in the field with the move towards incorporating LLM into the online system, notably by avoiding their use during inference. Our survey categorizes the existing LLMERS approaches into three primary types based on the component of the RS model being augmented: Knowledge Enhancement, Interaction Enhancement, and Model Enhancement. We provide an in-depth analysis of each category, discussing the methodologies, challenges, and contributions of recent studies. Furthermore, we highlight several promising research directions that could further advance the field of LLMERS.
LGJan 31, 2024
A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation ModelsWenxuan Yang, Weimin Tan, Yuqi Sun et al.
Foundation models, pre-trained on massive datasets, have achieved unprecedented generalizability. However, is it truly necessary to involve such vast amounts of data in pre-training, consuming extensive computational resources? This paper introduces data-effective learning, aiming to use data in the most impactful way to pre-train foundation models. This involves strategies that focus on data quality rather than quantity, ensuring the data used for training has high informational value. Data-effective learning plays a profound role in accelerating foundation model training, reducing computational costs, and saving data storage, which is very important as the volume of medical data in recent years has grown beyond many people's expectations. However, due to the lack of standards and comprehensive benchmarks, research on medical data-effective learning is poorly studied. To address this gap, our paper introduces a comprehensive benchmark specifically for evaluating data-effective learning in the medical field. This benchmark includes a dataset with millions of data samples from 31 medical centers (DataDEL), a baseline method for comparison (MedDEL), and a new evaluation metric (NormDEL) to objectively measure data-effective learning performance. Our extensive experimental results show the baseline MedDEL can achieve performance comparable to the original large dataset with only 5% of the data. Establishing such an open data-effective learning benchmark is crucial for the medical foundation model research community because it facilitates efficient data use, promotes collaborative breakthroughs, and fosters the development of cost-effective, scalable, and impactful healthcare solutions.
CVDec 18, 2023
Low-latency Space-time Supersampling for Real-time RenderingRuian He, Shili Zhou, Yuqi Sun et al.
With the rise of real-time rendering and the evolution of display devices, there is a growing demand for post-processing methods that offer high-resolution content in a high frame rate. Existing techniques often suffer from quality and latency issues due to the disjointed treatment of frame supersampling and extrapolation. In this paper, we recognize the shared context and mechanisms between frame supersampling and extrapolation, and present a novel framework, Space-time Supersampling (STSS). By integrating them into a unified framework, STSS can improve the overall quality with lower latency. To implement an efficient architecture, we treat the aliasing and warping holes unified as reshading regions and put forth two key components to compensate the regions, namely Random Reshading Masking (RRM) and Efficient Reshading Module (ERM). Extensive experiments demonstrate that our approach achieves superior visual fidelity compared to state-of-the-art (SOTA) methods. Notably, the performance is achieved within only 4ms, saving up to 75\% of time against the conventional two-stage pipeline that necessitates 17ms.
MEApr 30, 2020
Multiple imputation using chained random forests: a preliminary study based on the empirical distribution of out-of-bag prediction errorsShangzhi Hong, Yuqi Sun, Hanying Li et al.
Missing data are common in data analyses in biomedical fields, and imputation methods based on random forests (RF) have become widely accepted, as the RF algorithm can achieve high accuracy without the need for specification of data distributions or relationships. However, the predictions from RF do not contain information about prediction uncertainty, which was unacceptable for multiple imputation. Available RF-based multiple imputation methods tried to do proper multiple imputation either by sampling directly from observations under predicting nodes without accounting for the prediction error or by making normality assumption about the prediction error distribution. In this study, a novel RF-based multiple imputation method was proposed by constructing conditional distributions the empirical distribution of out-of-bag prediction errors. The proposed method was compared with previous method with parametric assumptions about RF's prediction errors and predictive mean matching based on simulation studies on data with presence of interaction term. The proposed non-parametric method can deliver valid multiple imputation results. The accompanying R package for this study is publicly available.
APApr 23, 2020
Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForestShangzhi Hong, Yuqi Sun, Hanying Li et al.
Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been proposed but their impact on imputation results and subsequent statistical analyses are relatively unknown. This study examines the two parallel strategies (variable-wise distributed computation and model-wise distributed computation) implemented in the random-forest imputation method, missForest. Results from the simulation experiments showed that the two parallel strategies can influence both the imputation process and the final imputation results differently. Specifically, even though both strategies produced similar normalized root mean squared prediction errors, the variable-wise distributed strategy led to additional biases when estimating the mean and inter-correlation of the covariates and their regression coefficients.
CLMar 3, 2018
Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component AnalysisHaoyue Shi, Yuqi Sun, Junfeng Hu
Unsupervised learned representations of polysemous words generate a large of pseudo multi senses since unsupervised methods are overly sensitive to contextual variations. In this paper, we address the pseudo multi-sense detection for word embeddings by dimensionality reduction of sense pairs. We propose a novel principal analysis method, termed Ex-RPCA, designed to detect both pseudo multi senses and real multi senses. With Ex-RPCA, we empirically show that pseudo multi senses are generated systematically in unsupervised method. Moreover, the multi-sense word embeddings can by improved by a simple linear transformation based on Ex-RPCA. Our improved word embedding outperform the original one by 5.6 points on Stanford contextual word similarity (SCWS) dataset. We hope our simple yet effective approach will help the linguistic analysis of multi-sense word embeddings in the future.
LGAug 15, 2017
Learning Rich Geographical Representations: Predicting Colorectal Cancer Survival in the State of IowaMichael T. Lash, Yuqi Sun, Xun Zhou et al.
Neural networks are capable of learning rich, nonlinear feature representations shown to be beneficial in many predictive tasks. In this work, we use these models to explore the use of geographical features in predicting colorectal cancer survival curves for patients in the state of Iowa, spanning the years 1989 to 2012. Specifically, we compare model performance using a newly defined metric -- area between the curves (ABC) -- to assess (a) whether survival curves can be reasonably predicted for colorectal cancer patients in the state of Iowa, (b) whether geographical features improve predictive performance, and (c) whether a simple binary representation or richer, spectral clustering-based representation perform better. Our findings suggest that survival curves can be reasonably estimated on average, with predictive performance deviating at the five-year survival mark. We also find that geographical features improve predictive performance, and that the best performance is obtained using richer, spectral analysis-elicited features.