Mingyu Huang

LG
h-index2
9papers
51citations
Novelty48%
AI Score41

9 Papers

QUANT-PHSep 9, 2023
Detecting Violations of Differential Privacy for Quantum Algorithms

Ji Guan, Wang Fang, Mingyu Huang et al.

Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A detection algorithm is developed to verify whether a (noisy) quantum algorithm is differentially private and automatically generate bugging information when the violation of differential privacy is reported. The information consists of a pair of quantum states that violate the privacy, to illustrate the cause of the violation. Our algorithm is equipped with Tensor Networks, a highly efficient data structure, and executed both on TensorFlow Quantum and TorchQuantum which are the quantum extensions of famous machine learning platforms -- TensorFlow and PyTorch, respectively. The effectiveness and efficiency of our algorithm are confirmed by the experimental results of almost all types of quantum algorithms already implemented on realistic quantum computers, including quantum supremacy algorithms (beyond the capability of classical algorithms), quantum machine learning models, quantum approximate optimization algorithms, and variational quantum eigensolvers with up to 21 quantum bits.

PFDec 22, 2024Code
Rethinking Performance Analysis for Configurable Software Systems: A Case Study from a Fitness Landscape Perspective

Mingyu Huang, Peili Mao, Ke Li

Modern software systems are often highly configurable to tailor varied requirements from diverse stakeholders. Understanding the mapping between configurations and the desired performance attributes plays a fundamental role in advancing the controllability and tuning of the underlying system, yet has long been a dark hole of knowledge due to its black-box nature. While there have been previous efforts in performance analysis for these systems, they analyze the configurations as isolated data points without considering their inherent spatial relationships. This renders them incapable of interrogating many important aspects of the configuration space like local optima. In this work, we advocate a novel perspective to rethink performance analysis -- modeling the configuration space as a structured ``landscape''. To support this proposition, we designed \our, an open-source, graph data mining empowered fitness landscape analysis (FLA) framework. By applying this framework to $86$M benchmarked configurations from $32$ running workloads of $3$ real-world systems, we arrived at $6$ main findings, which together constitute a holistic picture of the landscape topography, with thorough discussions about their implications on both configuration tuning and performance modeling.

LGOct 28, 2025Code
Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA

Mingyu Huang, Shasha Zhou, Ke Li

Machine learning models increasingly map biological sequence-fitness landscapes to predict mutational effects. Effective evaluation of these models requires benchmarks curated from empirical data. Despite their impressive scales, existing benchmarks lack topographical information regarding the underlying fitness landscapes, which hampers interpretation and comparison of model performance beyond averaged scores. Here, we introduce GraphFLA, a Python framework that constructs and analyzes fitness landscapes from mutagensis data in diverse modalities (e.g., DNA, RNA, protein, and beyond) with up to millions of mutants. GraphFLA calculates 20 biologically relevant features that characterize 4 fundamental aspects of landscape topography. By applying GraphFLA to over 5,300 landscapes from ProteinGym, RNAGym, and CIS-BP, we demonstrate its utility in interpreting and comparing the performance of dozens of fitness prediction models, highlighting factors influencing model accuracy and respective advantages of different models. In addition, we release 155 combinatorially complete empirical fitness landscapes, encompassing over 2.2 million sequences across various modalities. All the codes and datasets are available at https://github.com/COLA-Laboratory/GraphFLA.

LGNov 23, 2023
On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study

Mingyu Huang, Ke Li

Previous efforts on hyperparameter optimization (HPO) of machine learning (ML) models predominately focus on algorithmic advances, yet little is known about the topography of the underlying hyperparameter (HP) loss landscape, which plays a fundamental role in governing the search process of HPO. While several works have conducted fitness landscape analysis (FLA) on various ML systems, they are limited to properties of isolated landscape without interrogating the potential structural similarities among them. The exploration of such similarities can provide a novel perspective for understanding the mechanism behind modern HPO methods, but has been missing, possibly due to the expensive cost of large-scale landscape construction, and the lack of effective analysis methods. In this paper, we mapped 1,500 HP loss landscapes of 6 representative ML models on 63 datasets across different fidelity levels, with 11M+ configurations. By conducting exploratory analysis on these landscapes with fine-grained visualizations and dedicated FLA metrics, we observed a similar landscape topography across a wide range of models, datasets, and fidelities, and shed light on several central topics in HPO.

LGDec 5, 2023
Towards the Inferrence of Structural Similarity of Combinatorial Landscapes

Mingyu Huang, Ke Li

One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. By conducting large-scale empirical experiments on three classic combinatorial optimization problems, we gain concrete evidence to support the existence of structural similarity between landscapes of the same classes within neighboring dimensions. We also interrogated the relationship between landscapes of different problem classes.

LGNov 16, 2025
Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

Shasha Zhou, Mingyu Huang, Jack Cole et al.

The recent proliferation of large language models (LLMs) holds the potential to revolutionize healthcare, with strong capabilities in diverse medical tasks. Yet, deploying LLMs in high-stakes healthcare settings requires rigorous verification and validation to understand any potential harm. This paper investigates the reliability and viability of using medical knowledge graphs (KGs) for the automated factuality evaluation of LLM-generated responses. To ground this investigation, we introduce FAITH, a framework designed to systematically probe the strengths and limitations of this KG-based approach. FAITH operates without reference answers by decomposing responses into atomic claims, linking them to a medical KG, and scoring them based on evidence paths. Experiments on diverse medical tasks with human subjective evaluations demonstrate that KG-grounded evaluation achieves considerably higher correlations with clinician judgments and can effectively distinguish LLMs with varying capabilities. It is also robust to textual variances. The inherent explainability of its scoring can further help users understand and mitigate the limitations of current LLMs. We conclude that while limitations exist, leveraging KGs is a prominent direction for automated factuality assessment in healthcare.

CLMay 25, 2025
Conversational Exploration of Literature Landscape with LitChat

Mingyu Huang, Shasha Zhou, Yuxuan Chen et al.

We are living in an era of "big literature", where the volume of digital scientific publications is growing exponentially. While offering new opportunities, this also poses challenges for understanding literature landscapes, as traditional manual reviewing is no longer feasible. Recent large language models (LLMs) have shown strong capabilities for literature comprehension, yet they are incapable of offering "comprehensive, objective, open and transparent" views desired by systematic reviews due to their limited context windows and trust issues like hallucinations. Here we present LitChat, an end-to-end, interactive and conversational literature agent that augments LLM agents with data-driven discovery tools to facilitate literature exploration. LitChat automatically interprets user queries, retrieves relevant sources, constructs knowledge graphs, and employs diverse data-mining techniques to generate evidence-based insights addressing user needs. We illustrate the effectiveness of LitChat via a case study on AI4Health, highlighting its capacity to quickly navigate the users through large-scale literature landscape with data-based evidence that is otherwise infeasible with traditional means.

CLJun 11, 2024
FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

Pengfei Zhou, Weiqing Min, Chaoran Fu et al.

Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at http://222.92.101.211:8200.

SEJan 5, 2022
Unlocking the Secrets of Software Configuration Landscapes-Ruggedness, Accessibility, Escapability, and Transferability

Mingyu Huang, Peili Mao, Ke Li

Modern software systems are often highly configurable to tailor varied requirements from diverse stakeholders. Understanding the mapping between configurations and the desired performance attributes plays a fundamental role in advancing the controllability and tuning of the underlying system, yet has long been a dark hole of knowledge due to their black-box nature and the enormous combinatorial configuration space. In this paper, using $86$M evaluated configurations from three real-world systems on $32$ running workloads, we conducted one of its kind fitness landscape analysis (FLA) for configurable software systems. With comprehensive FLA methods, we for the first time show that: $i)$ the software configuration landscapes are fairly rugged, with numerous scattered local optima; $ii)$ nevertheless, the top local optima are highly accessible, featuring significantly larger basins of attraction; $iii)$ most inferior local optima are escapable with simple perturbations; $iv)$ landscapes of the same system with different workloads share structural similarities, which can be exploited to expedite heuristic search. Our results also provide valuable insights on the design of tailored meta-heuristics for configuration tuning; our FLA framework along with the collected data, build solid foundation for future research in this direction.