Yiyao Zhang

DS
h-index7
7papers
15citations
Novelty51%
AI Score50

7 Papers

DSApr 6
Work-Efficient Parallel Counting via Sampling

Hongyang Liu, Yitong Yin, Yiyao Zhang

A canonical approach to approximating the partition function of a Gibbs distribution via sampling is simulated annealing. This method has led to efficient reductions from counting to sampling, including: $\bullet$ classic non-adaptive (parallel) algorithms with sub-optimal cost (Dyer-Frieze-Kannan '89; Bezáková-Štefankovič-Vazirani-Vigoda '08); $\bullet$ adaptive (sequential) algorithms with near-optimal cost (Štefankovič-Vempala-Vigoda '09; Huber '15; Kolmogorov '18; Harris-Kolmogorov '24). We present an algorithm that achieves both near-optimal total work and efficient parallelism, providing a reduction from counting to sampling with logarithmic depth and near-optimal work. As consequences, we obtain work-efficient parallel counting algorithms for several important models, including the hardcore and Ising models within the uniqueness regime.

APSep 10, 2023
Super-Resolution Surface Reconstruction from Few Low-Resolution Slices

Yiyao Zhang, Ke Chen, Shang-Hua Yang

In many imaging applications where segmented features (e.g. blood vessels) are further used for other numerical simulations (e.g. finite element analysis), the obtained surfaces do not have fine resolutions suitable for the task. Increasing the resolution of such surfaces becomes crucial. This paper proposes a new variational model for solving this problem, based on an Euler-Elastica-based regulariser. Further, we propose and implement two numerical algorithms for solving the model, a projected gradient descent method and the alternating direction method of multipliers. Numerical experiments using real-life examples (including two from outputs of another variational model) have been illustrated for effectiveness. The advantages of the new model are shown through quantitative comparisons by the standard deviation of Gaussian curvatures and mean curvatures from the viewpoint of discrete geometry.

DSApr 1
Near-Optimal Parallel Approximate Counting via Sampling

David G. Harris, Vladimir Kolmogorov, Hongyang Liu et al.

The computational equivalence between approximate counting and sampling is well established for polynomial-time algorithms. The most efficient general reduction from counting to sampling is achieved via simulated annealing, where the counting problem is formulated in terms of estimating the ratio $Q={Z(β_{\max})}/{Z(β_{\min})}$ between partition functions $Z(β)=\sum_{x\in Ω} \exp(βH(x))$ of Gibbs distributions $μ_β$ over $Ω$ with Hamiltonian $H$, given access to a sampling oracle that produces samples from $μ_β$ for $β\in [β_{\min}, β_{\max}]$. The best bound achieved by known annealing algorithms with relative error $\varepsilon$ is $O(q \log h / \varepsilon^2)$, where $q, h$ are parameters which respectively bound $\ln Q$ and $H$. However, all known algorithms attaining this near-optimal complexity are inherently sequential, or *adaptive*: the queried parameters $β$ depend on previous samples. We develop a simple non-adaptive algorithm for approximate counting using $O(q \log^2 h / \varepsilon^2)$ samples, as well as an algorithm that achieves $O(q \log h / \varepsilon^2)$ samples with just two rounds of adaptivity, matching the best sample complexity of sequential algorithms. These algorithms naturally give rise to work-efficient parallel (RNC) counting algorithms. We discuss applications to RNC counting algorithms for several classic models, including the anti-ferromagnetic 2-spin, monomer-dimer and ferromagnetic Ising models.

CRMay 3
AgenticVM: Agentic AI for Adaptive Software Vulnerability Management

Asrul Arifin, Hussain Ahmad, Yiyao Zhang et al.

As software systems grow in scale and complexity, vulnerability management is increasingly strained by high alert volumes, fragmented toolchains, and manual triage processes. We introduce AgenticVM, a multi-agent framework that integrates large language models with security tools to automate vulnerability detection, assessment, prioritization, and reporting. AgenticVM combines rule-based processing, a BERT-based CVSS prediction module, and specialised LLM-driven agents, leveraging data from sources such as the National Vulnerability Database and the European Union Vulnerability Database. Across multiple evaluation scenarios, AgenticVM reduces raw scanner outputs into compact, actionable queues, achieving up to 98% alert reduction (e.g., from 3,983 findings to 82 high-priority items), while predicting missing CVSS attributes with 89.3% accuracy. These results demonstrate improved prioritisation efficiency and reduced analyst workload without compromising risk visibility. Beyond performance, the framework provides practical design insights into agent decomposition, tool-LLM integration, and human-in-the-loop governance for real-world deployment.

DSNov 4, 2025
Learning CNF formulas from uniform random solutions in the local lemma regime

Weiming Feng, Xiongxin Yang, Yixiao Yu et al.

We study the problem of learning a $n$-variables $k$-CNF formula $Φ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovász local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.

CRApr 6
Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang, Diksha Goel, Hussain Ahmad

Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.

PMSep 14, 2025
RegimeFolio: A Regime Aware ML System for Sectoral Portfolio Optimization in Dynamic Markets

Yiyao Zhang, Diksha Goel, Hussain Ahmad et al.

Financial markets are inherently non-stationary, with shifting volatility regimes that alter asset co-movements and return distributions. Standard portfolio optimization methods, typically built on stationarity or regime-agnostic assumptions, struggle to adapt to such changes. To address these challenges, we propose RegimeFolio, a novel regime-aware and sector-specialized framework that, unlike existing regime-agnostic models such as DeepVol and DRL optimizers, integrates explicit volatility regime segmentation with sector-specific ensemble forecasting and adaptive mean-variance allocation. This modular architecture ensures forecasts and portfolio decisions remain aligned with current market conditions, enhancing robustness and interpretability in dynamic markets. RegimeFolio combines three components: (i) an interpretable VIX-based classifier for market regime detection; (ii) regime and sector-specific ensemble learners (Random Forest, Gradient Boosting) to capture conditional return structures; and (iii) a dynamic mean-variance optimizer with shrinkage-regularized covariance estimates for regime-aware allocation. We evaluate RegimeFolio on 34 large cap U.S. equities from 2020 to 2024. The framework achieves a cumulative return of 137 percent, a Sharpe ratio of 1.17, a 12 percent lower maximum drawdown, and a 15 to 20 percent improvement in forecast accuracy compared to conventional and advanced machine learning benchmarks. These results show that explicitly modeling volatility regimes in predictive learning and portfolio allocation enhances robustness and leads to more dependable decision-making in real markets.