Houying Zhu

NA
h-index5
5papers
37citations
Novelty38%
AI Score22

5 Papers

COJan 15, 2016
Discrepancy bounds for uniformly ergodic Markov chain quasi-Monte Carlo

Josef Dick, Daniel Rudolf, Houying Zhu

Markov chains can be used to generate samples whose distribution approximates a given target distribution. The quality of the samples of such Markov chains can be measured by the discrepancy between the empirical distribution of the samples and the target distribution. We prove upper bounds on this discrepancy under the assumption that the Markov chain is uniformly ergodic and the driver sequence is deterministic rather than independent $U(0,1)$ random variables. In particular, we show the existence of driver sequences for which the discrepancy of the Markov chain from the target distribution with respect to certain test sets converges with (almost) the usual Monte Carlo rate of $n^{-1/2}$.

COAug 8, 2014
Discrepancy Estimates for Acceptance-Rejection Samplers Using Stratified Inputs

Houying Zhu, Josef Dick

In this paper we propose an acceptance-rejection sampler using stratified inputs as diver sequence. We estimate the discrepancy of the points generated by this algorithm. First we show an upper bound on the star discrepancy of order $N^{-1/2-1/(2s)}$. Further we prove an upper bound on the $q$-th moment of the $L_q$-discrepancy $(\mathbb{E}[N^{q}L^{q}_{q,N}])^{1/q}$ for $2\le q\le \infty$, which is of order $N^{(1-1/s)(1-1/q)}$. We also present an improved convergence rate for a deterministic acceptance-rejection algorithm using $(t,m,s)-$nets as driver sequence.

NAApr 15, 2016
A Discrepancy Bound for Deterministic Acceptance-Rejection Samplers Beyond $N^{-1/2}$ in Dimension 1

Houying Zhu, Josef Dick

In this paper we consider an acceptance-rejection (AR) sampler based on deterministic driver sequences. We prove that the discrepancy of an $N$ element sample set generated in this way is bounded by $\mathcal{O} (N^{-2/3}\log N)$, provided that the target density is twice continuously differentiable with non-vanishing curvature and the AR sampler uses the driver sequence $$\mathcal{K}_M= \{( j α, j β) ~~ mod~~1 \mid j = 1,\ldots,M\}, $$ where $α,β$ are real algebraic numbers such that $1,α,β$ is a basis of a number field over $\mathbb{Q}$ of degree $3$. For the driver sequence $$\mathcal{F}_k= \{ ({j}/{F_k}, \{jF_{k-1}/{F_k}\} ) \mid j=1,\ldots, F_k\},$$ where $F_k$ is the $k$-th Fibonacci number and $\{x\}=x-\lfloor x \rfloor$ is the fractional part of a non-negative real number $x$, we can remove the $\log$ factor to improve the convergence rate to $\mathcal{O}(N^{-2/3})$, where again $N$ is the number of samples we accepted. We also introduce a criterion for measuring the goodness of driver sequences. The proposed approach is numerically tested by calculating the star-discrepancy of samples generated for some target densities using $\mathcal{K}_M$ and $\mathcal{F}_k$ as driver sequences. These results confirm that achieving a convergence rate beyond $N^{-1/2}$ is possible in practice using $\mathcal{K}_M$ and $\mathcal{F}_k$ as driver sequences in the acceptance-rejection sampler.

NASep 14, 2017
Analysis of Framelet Transforms on a Simplex

Yu Guang Wang, Houying Zhu

In this paper, we construct framelets associated with a sequence of quadrature rules on the simplex $T^{2}$ in $\mathbb{R}^{2}$. We give the framelet transforms -- decomposition and reconstruction of the coefficients for framelets of a function on $T^{2}$. We prove that the reconstruction is exact when the framelets are tight. We give an example of construction of framelets and show that the framelet transforms can be computed as fast as FFT.

BMApr 21, 2024
ProteinEngine: Empower LLM with Domain Knowledge for Protein Engineering

Yiqing Shen, Outongyi Lv, Houying Zhu et al.

Large language models (LLMs) have garnered considerable attention for their proficiency in tackling intricate tasks, particularly leveraging their capacities for zero-shot and in-context learning. However, their utility has been predominantly restricted to general tasks due to an absence of domain-specific knowledge. This constraint becomes particularly pertinent in the realm of protein engineering, where specialized expertise is required for tasks such as protein function prediction, protein evolution analysis, and protein design, with a level of specialization that existing LLMs cannot furnish. In response to this challenge, we introduce \textsc{ProteinEngine}, a human-centered platform aimed at amplifying the capabilities of LLMs in protein engineering by seamlessly integrating a comprehensive range of relevant tools, packages, and software via API calls. Uniquely, \textsc{ProteinEngine} assigns three distinct roles to LLMs, facilitating efficient task delegation, specialized task resolution, and effective communication of results. This design fosters high extensibility and promotes the smooth incorporation of new algorithms, models, and features for future development. Extensive user studies, involving participants from both the AI and protein engineering communities across academia and industry, consistently validate the superiority of \textsc{ProteinEngine} in augmenting the reliability and precision of deep learning in protein engineering tasks. Consequently, our findings highlight the potential of \textsc{ProteinEngine} to bride the disconnected tools for future research in the protein engineering domain.