Ziyi Huang

CL
h-index4
19papers
104citations
Novelty51%
AI Score54

19 Papers

IVJun 9, 2022
Cardiac Adipose Tissue Segmentation via Image-Level Annotations

Ziyi Huang, Yu Gan, Theresa Lye et al.

Automatically identifying the structural substrates underlying cardiac abnormalities can potentially provide real-time guidance for interventional procedures. With the knowledge of cardiac tissue substrates, the treatment of complex arrhythmias such as atrial fibrillation and ventricular tachycardia can be further optimized by detecting arrhythmia substrates to target for treatment (i.e., adipose) and identifying critical structures to avoid. Optical coherence tomography (OCT) is a real-time imaging modality that aids in addressing this need. Existing approaches for cardiac image analysis mainly rely on fully supervised learning techniques, which suffer from the drawback of workload on labor-intensive annotation process of pixel-wise labeling. To lessen the need for pixel-wise labeling, we develop a two-stage deep learning framework for cardiac adipose tissue segmentation using image-level annotations on OCT images of human cardiac substrates. In particular, we integrate class activation mapping with superpixel segmentation to solve the sparse tissue seed challenge raised in cardiac tissue segmentation. Our study bridges the gap between the demand on automatic tissue analysis and the lack of high-quality pixel-wise annotations. To the best of our knowledge, this is the first study that attempts to address cardiac tissue segmentation on OCT images via weakly supervised learning techniques. Within an in-vitro human cardiac OCT dataset, we demonstrate that our weakly supervised approach on image-level annotations achieves comparable performance as fully supervised methods trained on pixel-wise annotations.

MLJun 9, 2023
Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks

Ziyi Huang, Henry Lam, Haofeng Zhang

Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{characterize}, and \emph{remove}, the uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled dataset, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads.

LGJun 9, 2022
Evaluating Aleatoric Uncertainty via Conditional Generative Models

Ziyi Huang, Henry Lam, Haofeng Zhang

Aleatoric uncertainty quantification seeks for distributional knowledge of random responses, which is important for reliability analysis and robustness improvement in machine learning applications. Previous research on aleatoric uncertainty estimation mainly targets closed-formed conditional densities or variances, which requires strong restrictions on the data distribution or dimensionality. To overcome these restrictions, we study conditional generative models for aleatoric uncertainty estimation. We introduce two metrics to measure the discrepancy between two conditional distributions that suit these models. Both metrics can be easily and unbiasedly computed via Monte Carlo simulation of the conditional generative models, thus facilitating their evaluation and training. We demonstrate numerically how our metrics provide correct measurements of conditional distributional discrepancies and can be used to train conditional models competitive against existing benchmarks.

CLMay 19
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

Yuanqing Cai, Ziyi Huang, Minhao Liu et al.

Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of \emph{inattentional blindness} in human cognition, we investigate whether LLMs, trained on human-preferred corpora that embed attentional biases, exhibit a similar limitation: \emph{failing to attend to subtle yet important contextual cues under explicit task instructions}. To evaluate this, we introduce the task of \textbf{explicit-implicit reasoning} and present \textbf{MixRea}, a benchmark of 2,246 multiple-choice questions across 9 reasoning types with varying distributions of explicit and implicit information. Evaluation of 21 advanced LLMs shows that even the best-performing reasoning model (Gemini 2.5 Pro) achieves only 42.8\% consistency, revealing widespread inattentional blindness. To mitigate this, we propose \textbf{Potential Relation Completion Prompting (PRCP)}, a prompting method that improves reasoning by recovering overlooked causal relations. Further analysis shows that this limitation persists across diverse multi-source reasoning tasks, highlighting the need for more cognitively aligned models.

CLMay 13
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation

Weiqing Luo, Zongye Hu, Xiao Wang et al.

Visual evidence selection is a critical component of multimodal retrieval-augmented generation (RAG), yet existing methods typically rely on semantic relevance or surface-level similarity, which are often misaligned with the actual utility of visual evidence for downstream reasoning. We reformulate multimodal evidence selection from an information-theoretic perspective by defining evidence utility as the information gain induced on a model's output distribution. To overcome the intractability of answer-space optimization, we introduce a latent notion of evidence helpfulness and theoretically show that, under mild assumptions, ranking evidence by information gain on this latent variable is equivalent to answer-space utility. We further propose a training-free, surrogate-accelerated framework that efficiently estimates evidence utility using lightweight multimodal models. Experiments on MRAG-Bench and Visual-RAG across multiple model families demonstrate that our method consistently outperforms state-of-the-art RAG baselines while achieving substantial reductions in computational cost.

CLMar 6, 2025
Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition

Bin Chen, Yu Zhang, Hongfei Ye et al.

Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37\% and 6.28\% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.

CLFeb 20, 2025
SR-LLM: Rethinking the Structured Representation in Large Language Model

Jiahuan Zhang, Tianheng Wang, Hanqing Wu et al.

Structured representations, exemplified by Abstract Meaning Representation (AMR), have long been pivotal in computational linguistics. However, their role remains ambiguous in the Large Language Models (LLMs) era. Initial attempts to integrate structured representation into LLMs via a zero-shot setting yielded inferior performance. We hypothesize that such a decline stems from the structure information being passed into LLMs in a code format unfamiliar to LLMs' training corpora. Consequently, we propose SR-LLM, an innovative framework with two settings to explore a superior way of integrating structured representation with LLMs from training-free and training-dependent perspectives. The former integrates structural information through natural language descriptions in LLM prompts, whereas its counterpart augments the model's inference capability through fine-tuning on linguistically described structured representations. Performance improvements were observed in widely downstream datasets, with particularly notable gains of 3.17% and 12.38% in PAWS. To the best of our knowledge, this work represents the pioneering demonstration that leveraging structural representations can substantially enhance LLMs' inference capability. We hope that our work sheds light and encourages future research to enhance the reasoning and interoperability of LLMs by structure data.

CLJul 11, 2025
PromotionGo at SemEval-2025 Task 11: A Feature-Centric Framework for Cross-Lingual Multi-Emotion Detection in Short Texts

Ziyi Huang, Xia Cui

This paper presents our system for SemEval 2025 Task 11: Bridging the Gap in Text-Based Emotion Detection (Track A), which focuses on multi-label emotion detection in short texts. We propose a feature-centric framework that dynamically adapts document representations and learning algorithms to optimize language-specific performance. Our study evaluates three key components: document representation, dimensionality reduction, and model training in 28 languages, highlighting five for detailed analysis. The results show that TF-IDF remains highly effective for low-resource languages, while contextual embeddings like FastText and transformer-based document representations, such as those produced by Sentence-BERT, exhibit language-specific strengths. Principal Component Analysis (PCA) reduces training time without compromising performance, particularly benefiting FastText and neural models such as Multi-Layer Perceptrons (MLP). Computational efficiency analysis underscores the trade-off between model complexity and processing cost. Our framework provides a scalable solution for multilingual emotion detection, addressing the challenges of linguistic diversity and resource constraints.

IVFeb 2
Real-time topology-aware M-mode OCT segmentation for robotic deep anterior lamellar keratoplasty (DALK) guidance

Rosalinda Xiong, Jinglun Yu, Yaning Wang et al.

Robotic deep anterior lamellar keratoplasty (DALK) requires accurate real time depth feedback to approach Descemet's membrane (DM) without perforation. M-mode intraoperative optical coherence tomography (OCT) provides high temporal resolution depth traces, but speckle noise, attenuation, and instrument induced shadowing often result in discontinuous or ambiguous layer interfaces that challenge anatomically consistent segmentation at deployment frame rates. We present a lightweight, topology aware M-mode segmentation pipeline based on UNeXt that incorporates anatomical topology regularization to stabilize boundary continuity and layer ordering under low signal to noise ratio conditions. The proposed system achieves end to end throughput exceeding 80 Hz measured over the complete preprocessing inference overlay pipeline on a single GPU, demonstrating practical real time guidance beyond model only timing. This operating margin provides temporal headroom to reject low quality or dropout frames while maintaining a stable effective depth update rate. Evaluation on a standard rabbit eye M-mode dataset using an established baseline protocol shows improved qualitative boundary stability compared with topology agnostic controls, while preserving deployable real time performance.

IVFeb 2
Physics-based generation of multilayer corneal OCT data via Gaussian modeling and MCML for AI-driven diagnostic and surgical guidance applications

Jinglun Yu, Yaning Wang, Rosalinda Xiong et al.

Training deep learning models for corneal optical coherence tomography (OCT) imaging is limited by the availability of large, well-annotated datasets. We present a configurable Monte Carlo simulation framework that generates synthetic corneal B-scan optical OCT images with pixel-level five-layer segmentation labels derived directly from the simulation geometry. A five-layer corneal model with Gaussian surfaces captures curvature and thickness variability in healthy and keratoconic eyes. Each layer is assigned optical properties from the literature and light transport is simulated using Monte Carlo modeling of light transport in multi-layered tissues (MCML), while incorporating system features such as the confocal PSF and sensitivity roll-off. This approach produces over 10,000 high-resolution (1024x1024) image-label pairs and supports customization of geometry, photon count, noise, and system parameters. The resulting dataset enables systematic training, validation, and benchmarking of AI models under controlled, ground-truth conditions, providing a reproducible and scalable resource to support the development of diagnostic and surgical guidance applications in image-guided ophthalmology.

CLNov 18, 2025
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

Xia Cui, Ziyi Huang, Naeemeh Adel

Annotation bias in NLP datasets remains a major challenge for developing multilingual Large Language Models (LLMs), particularly in culturally diverse settings. Bias from task framing, annotator subjectivity, and cultural mismatches can distort model outputs and exacerbate social harms. We propose a comprehensive framework for understanding annotation bias, distinguishing among instruction bias, annotator bias, and contextual and cultural bias. We review detection methods (including inter-annotator agreement, model disagreement, and metadata analysis) and highlight emerging techniques such as multilingual model divergence and cultural inference. We further outline proactive and reactive mitigation strategies, including diverse annotator recruitment, iterative guideline refinement, and post-hoc model adjustments. Our contributions include: (1) a typology of annotation bias; (2) a synthesis of detection metrics; (3) an ensemble-based bias mitigation approach adapted for multilingual settings, and (4) an ethical analysis of annotation processes. Together, these insights aim to inform more equitable and culturally grounded annotation pipelines for LLMs.

LGMay 27, 2025
Interpretable Credit Default Prediction with Ensemble Learning and SHAP

Shiqi Yang, Ziyi Huang, Wengran Xiao et al.

This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms. Through preprocessing, feature engineering, and model training of the Home Credit dataset, the performance of multiple models including logistic regression, random forest, XGBoost, LightGBM, etc. in terms of accuracy, precision, and recall is evaluated. The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems. It shows strong robustness. At the same time, the SHAP method is used to analyze the importance and dependency of features, and it is found that the external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value. The research results provide effective reference and technical support for the intelligent development of credit risk control systems.

CVMar 12, 2025
Post-interactive Multimodal Trajectory Prediction for Autonomous Driving

Ziyi Huang, Yang Li, Dushuai Li et al.

Modeling the interactions among agents for trajectory prediction of autonomous driving has been challenging due to the inherent uncertainty in agents' behavior. The interactions involved in the predicted trajectories of agents, also called post-interactions, have rarely been considered in trajectory prediction models. To this end, we propose a coarse-to-fine Transformer for multimodal trajectory prediction, i.e., Pioformer, which explicitly extracts the post-interaction features to enhance the prediction accuracy. Specifically, we first build a Coarse Trajectory Network to generate coarse trajectories based on the observed trajectories and lane segments, in which the low-order interaction features are extracted with the graph neural networks. Next, we build a hypergraph neural network-based Trajectory Proposal Network to generate trajectory proposals, where the high-order interaction features are learned by the hypergraphs. Finally, the trajectory proposals are sent to the Proposal Refinement Network for further refinement. The observed trajectories and trajectory proposals are concatenated together as the inputs of the Proposal Refinement Network, in which the post-interaction features are learned by combining the previous interaction features and trajectory consistency features. Moreover, we propose a three-stage training scheme to facilitate the learning process. Extensive experiments on the Argoverse 1 dataset demonstrate the superiority of our method. Compared with the baseline HiVT-64, our model has reduced the prediction errors by 4.4%, 8.4%, 14.4%, 5.7% regarding metrics minADE6, minFDE6, MR6, and brier-minFDE6, respectively.

AIFeb 15, 2025
ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis

Xueshen Li, Xinlong Hou, Ziyi Huang et al.

Recent advancements in large language models (LLMs) have demonstrated extraordinary comprehension capabilities with remarkable breakthroughs on various vision-language tasks. However, the application of LLMs in generating reliable medical diagnostic reports remains in the early stages. Currently, medical LLMs typically feature a passive interaction model where doctors respond to patient queries with little or no involvement in analyzing medical images. In contrast, some ChatBots simply respond to predefined queries based on visual inputs, lacking interactive dialogue or consideration of medical history. As such, there is a gap between LLM-generated patient-ChatBot interactions and those occurring in actual patient-doctor consultations. To bridge this gap, we develop an LLM-based dialogue system, namely proactive multi-round vision-language interactions for computer-aided diagnosis (ProMRVL-CAD), to generate patient-friendly disease diagnostic reports. The proposed ProMRVL-CAD system allows proactive dialogue to provide patients with constant and reliable medical access via an integration of knowledge graph into a recommendation system. Specifically, we devise two generators: a Proactive Question Generator (Pro-Q Gen) to generate proactive questions that guide the diagnostic procedure and a Multi-Vision Patient-Text Diagnostic Report Generator (MVP-DR Gen) to produce high-quality diagnostic reports. Evaluating two real-world publicly available datasets, MIMIC-CXR and IU-Xray, our model has better quality in generating medical reports. We further demonstrate the performance of ProMRVL achieves robust under the scenarios with low image quality. Moreover, we have created a synthetic medical dialogue dataset that simulates proactive diagnostic interactions between patients and doctors, serving as a valuable resource for training LLM.

MLJun 20, 2024
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Ziyi Huang, Henry Lam, Haofeng Zhang

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $α$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

LGJan 31, 2022
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

Ziyi Huang, Henry Lam, Amirhossein Meisami et al.

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. However, there is a large discrepancy between the superior practical performance of these approaches and their theoretical justification. Previous research only indicates a negative theoretical result: Thompson sampling could have a worst-case linear regret $Ω(T)$ with a constant threshold on the inference error measured by one $α$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper Confidence Bound (EBUCB) framework that can efficiently accommodate bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the optimal regret order $O(\log T)$ if the inference error measured by two different $α$-divergences is less than a constant, regardless of how large this constant is. To our best knowledge, our study provides the first theoretical regret bound that is better than $o(T)$ in the setting of constant approximate inference error. Furthermore, in concordance with the negative results in previous studies, we show that only one bounded $α$-divergence is insufficient to guarantee a sub-linear regret.

LGOct 23, 2021
Quantifying Epistemic Uncertainty in Deep Learning

Ziyi Huang, Henry Lam, Haofeng Zhang

Uncertainty quantification is at the core of the reliability and robustness of machine learning. In this paper, we provide a theoretical framework to dissect the uncertainty, especially the \textit{epistemic} component, in deep learning into \textit{procedural variability} (from the training procedure) and \textit{data variability} (from the training data), which is the first such attempt in the literature to our best knowledge. We then propose two approaches to estimate these uncertainties, one based on influence function and one on batching. We demonstrate how our approaches overcome the computational difficulties in applying classical statistical methods. Experimental evaluations on multiple problem settings corroborate our theory and illustrate how our framework and estimation can provide direct guidance on modeling and data collection efforts.

MLFeb 26, 2021
Learning Prediction Intervals for Regression: Generalization and Calibration

Haoxian Chen, Ziyi Huang, Henry Lam et al.

We study the generation of prediction intervals in regression for uncertainty quantification. This task can be formalized as an empirical constrained optimization problem that minimizes the average interval width while maintaining the coverage accuracy across data. We strengthen the existing literature by studying two aspects of this empirical optimization. First is a general learning theory to characterize the optimality-feasibility tradeoff that encompasses Lipschitz continuity and VC-subgraph classes, which are exemplified in regression trees and neural networks. Second is a calibration machinery and the corresponding statistical theory to optimally select the regularization parameter that manages this tradeoff, which bypasses the overfitting issues in previous approaches in coverage attainment. We empirically demonstrate the strengths of our interval generation and calibration algorithms in terms of testing performances compared to existing benchmarks.

LGJan 31, 2021
Co-Seg: An Image Segmentation Framework Against Label Corruption

Ziyi Huang, Haofeng Zhang, Andrew Laine et al.

Supervised deep learning performance is heavily tied to the availability of high-quality labels for training. Neural networks can gradually overfit corrupted labels if directly trained on noisy datasets, leading to severe performance degradation at test time. In this paper, we propose a novel deep learning framework, namely Co-Seg, to collaboratively train segmentation networks on datasets which include low-quality noisy labels. Our approach first trains two networks simultaneously to sift through all samples and obtain a subset with reliable labels. Then, an efficient yet easily-implemented label correction strategy is applied to enrich the reliable subset. Finally, using the updated dataset, we retrain the segmentation network to finalize its parameters. Experiments in two noisy labels scenarios demonstrate that our proposed model can achieve results comparable to those obtained from supervised learning trained on the noise-free labels. In addition, our framework can be easily implemented in any segmentation algorithm to increase its robustness to noisy labels.