Xiang Zhou

h-index23

17papers

4,617citations

Novelty46%

AI Score43

Ranked #51,917 of 194,257 authors (top 27%)#10,308 in CL (top 33%)

17 Papers

1.2NAOct 24, 2017

Convex Splitting Method for the Calculation of Transition States of Energy Functional

Shuting Gu, Xiang Zhou

Among numerical methods for partial differential equations arising from steepest descent dynamics of energy functionals (e.g., Allen-Cahn and Cahn-Hilliard equations), the convex splitting method is well-known to maintain unconditional energy stability for a large time step size. In this work, we show how to use the convex splitting idea to find transition states, i.e., index-1 saddle points of the same energy functionals. Based on the iterative minimization formulation (IMF) for saddle points (SIAM J. Numer. Anal., vol. 53, p1786, 2015), we introduce the convex splitting method to minimize the auxiliary functional at each cycle of the IMF. We present a general principle of constructing convex splitting forms for these auxiliary functionals and show how to avoid solving nonlinear equations. The new numerical scheme based on the convex splitting method allows for large time step sizes. The new methods are tested for the one dimensional Ginzburg-Landau energy functional in the search of the Allen-Cahn or Cahn-Hilliard types of transition states. We provide the numerical results of transition states for the two dimensional Landau-Brazovskii energy functional for diblock copolymers.

15.5LGJun 3, 2023

Exploring the Optimal Choice for Generative Processes in Diffusion Models: Ordinary vs Stochastic Differential Equations

Yu Cao, Jingrun Chen, Yixin Luo et al.

The diffusion model has shown remarkable success in computer vision, but it remains unclear whether the ODE-based probability flow or the SDE-based diffusion model is more superior and under what circumstances. Comparing the two is challenging due to dependencies on data distributions, score training, and other numerical issues. In this paper, we study the problem mathematically for two limiting scenarios: the zero diffusion (ODE) case and the large diffusion case. We first introduce a pulse-shape error to perturb the score function and analyze error accumulation of sampling quality, followed by a thorough analysis for generalization to arbitrary error. Our findings indicate that when the perturbation occurs at the end of the generative process, the ODE model outperforms the SDE model with a large diffusion coefficient. However, when the perturbation occurs earlier, the SDE model outperforms the ODE model, and we demonstrate that the error of sample generation due to such a pulse-shape perturbation is exponentially suppressed as the diffusion term's magnitude increases to infinity. Numerical validation of this phenomenon is provided using Gaussian, Gaussian mixture, and Swiss roll distribution, as well as realistic datasets like MNIST and CIFAR-10.

1.2NADec 14, 2017

Estimation of exciton diffusion lengths of organic semiconductors in random domains

Jingrun Chen, Ling Lin, Zhiwen Zhang et al.

Exciton diffusion length plays a vital role in the function of opto-electronic devices. Oftentimes, the domain occupied by an organic semiconductor is subject to surface measurement error. In many experiments, photoluminescence over the domain is measured and used as the observation data to estimate this length parameter in an inverse manner based on the least square method. However, the result is sometimes found to be sensitive to the surface geometry of the domain. In this paper, we employ a random function representation for the uncertain surface of the domain. After non-dimensionalization, the forward model becomes a diffusion-type equation over the domain whose geometric boundary is subject to small random perturbations. We propose an asymptotic-based method as an approximate forward solver whose accuracy is justified both theoretically and numerically. It only requires solving several deterministic problems over a fixed domain. Therefore, for the same accuracy requirements we tested here, the running time of our approach is more than one order of magnitude smaller than that of directly solving the original stochastic boundary-value problem by the stochastic collocation method. In addition, from numerical results, we find that the correlation length of randomness is important to determine whether a 1D reduced model is a good surrogate for the 2D model.

3.3AINov 20, 2025Code

KRAL: Knowledge and Reasoning Augmented Learning for LLM-assisted Clinical Antimicrobial Therapy

Zhe Li, Yehan Qiu, Yujie Chen et al.

Clinical antimicrobial therapy requires the dynamic integration of pathogen profiles, host factors, pharmacological properties of antimicrobials, and the severity of infection.This complexity imposes fundamental limitations on the applicability of Large Language Models (LLMs) in high-stakes clinical decision-making including knowledge gaps, data privacy concerns, high deployment costs, and limited reasoning capabilities. To address these challenges, we propose KRAL (Knowledge and Reasoning Augmented Learning), a low-cost, scalable, privacy-preserving paradigm that leverages teacher-model reasoning to automatically distill knowledge and reasoning trajectories via answer-to-question reverse generation, employs heuristic learning for semi-supervised data augmentation (reducing manual annotation requirements by approximately 80%), and utilizes agentic reinforcement learning to jointly enhance medical knowledge and reasoning while optimizing computational and memory efficiency. A hierarchical evaluation employing diverse teacher-model proxies reduces assessment costs, while modular interface design facilitates seamless system updates. Experimental results demonstrate that KRAL significantly outperforms traditional Retrieval-Augmented Generation (RAG) and Supervised Fine-Tuning (SFT) methods. It improves knowledge question-answering capability (Accuracy@1 on the external open-source benchmark MEDQA increased by 1.8% vs. SFT and 3.6% vs. RAG) and reasoning capability (Pass@1 on the external benchmark PUMCH Antimicrobial increased by 27% vs. SFT and 27.2% vs. RAG), achieved at ~20% of SFT's long-term training costs. This establishes KRAL as an effective solution for enhancing local LLMs' clinical diagnostic capabilities, enabling low-cost, high-safety deployment in complex medical decision support.

31.9CLOct 7, 2020Code

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

Yixin Nie, Xiang Zhou, Mohit Bansal

Despite the subjective nature of many NLP tasks, most NLU evaluations have focused on using the majority label with presumably high agreement as the ground truth. Less attention has been paid to the distribution of human opinions. We collect ChaosNLI, a dataset with a total of 464,500 annotations to study Collective HumAn OpinionS in oft-used NLI evaluation sets. This dataset is created by collecting 100 annotations per example for 3,113 examples in SNLI and MNLI and 1,532 examples in Abductive-NLI. Analysis reveals that: (1) high human disagreement exists in a noticeable amount of examples in these datasets; (2) the state-of-the-art models lack the ability to recover the distribution over human labels; (3) models achieve near-perfect accuracy on the subset of data with a high level of human agreement, whereas they can barely beat a random guess on the data with low levels of human agreement, which compose most of the common errors made by state-of-the-art models on the evaluation sets. This questions the validity of improving model performance on old metrics for the low-agreement part of evaluation datasets. Hence, we argue for a detailed examination of human agreement in future data collection efforts, and evaluating model outputs against the distribution over collective human opinions. The ChaosNLI dataset and experimental scripts are available at https://github.com/easonnie/ChaosNLI

31.4CLMay 10, 2020Code

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics. Using contradiction-word bias and word-overlapping bias as our two bias examples, this paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. Next, we also compare two ways of directly debiasing the model without knowing what the dataset biases are in advance. The first approach aims to remove the label bias at the embedding level. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features by forcing orthogonality between these two sub-models. We performed evaluations on new balanced datasets extracted from the original MNLI dataset as well as the NLI stress tests, and show that the orthogonality approach is better at debiasing the model while maintaining competitive overall accuracy. Our code and data are available at: https://github.com/owenzx/LexicalDebias-ACL2020

31.4CLApr 28, 2020Code

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

Xiang Zhou, Yixin Nie, Hao Tan et al.

We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading Comprehension (RC) analysis/stress sets can be highly unstable. This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets? (2) Where does this instability come from? (3) How should we handle this instability and what are some potential solutions? For the first question, we conduct a thorough empirical study over analysis sets and find that in addition to the unstable final performance, the instability exists all along the training curve. We also observe lower-than-expected correlations between the analysis validation set and standard validation set, questioning the effectiveness of the current model-selection routine. Next, to answer the second question, we give both theoretical explanations and empirical evidence regarding the source of the instability, demonstrating that the instability mainly comes from high inter-example correlations within analysis sets. Finally, for the third question, we discuss an initial attempt to mitigate the instability and suggest guidelines for future work such as reporting the decomposed variance for more interpretable results and fair comparison across models. Our code is publicly available at: https://github.com/owenzx/InstabilityAnalysis

3.1LGNov 19, 2021

Learn Quasi-stationary Distributions of Finite State Markov Chain

Zhiqiang Cai, Ling Lin, Xiang Zhou

We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. Based on the fixed-point formulation of quasi-stationary distribution, we minimize the KL-divergence of two Markovian path distributions induced by the candidate distribution and the true target distribution. To solve this challenging minimization problem by gradient descent, we apply the reinforcement learning technique by introducing the reward and value functions. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and the value function. The numerical examples of finite state Markov chain are tested to demonstrate the new method.

32.7CLApr 20, 2021

Hidden Biases in Unreliable News Detection Datasets

Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos et al.

Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. While they all provide valuable resources for future research, we observe a number of problems that may lead to results that do not generalize in more realistic settings. Specifically, we show that selection bias during data collection leads to undesired artifacts in the datasets. In addition, while most systems train and predict at the level of individual articles, overlapping article sources in the training and evaluation data can provide a strong confounding factor that models can exploit. In the presence of this confounding factor, the models can achieve good performance by directly memorizing the site-label mapping instead of modeling the real task of unreliable news detection. We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap. Using the observations and experimental results, we provide practical suggestions on how to create more reliable datasets for the unreliable news detection task. We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.

0.7CLSep 22, 2020

Deep Reinforcement Learning for On-line Dialogue State Tracking

Zhi Chen, Lu Chen, Xiang Zhou et al.

Dialogue state tracking (DST) is a crucial module in dialogue management. It is usually cast as a supervised training problem, which is not convenient for on-line optimization. In this paper, a novel companion teaching based deep reinforcement learning (DRL) framework for on-line DST optimization is proposed. To the best of our knowledge, this is the first effort to optimize the DST module within DRL framework for on-line task-oriented spoken dialogue systems. In addition, dialogue policy can be further jointly updated. Experiments show that on-line DST optimization can effectively improve the dialogue manager performance while keeping the flexibility of using predefined policy. Joint training of both DST and policy can further improve the performance.

3.6CLSep 14, 2020

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Longxiang Liu, Zhuosheng Zhang, Hai Zhao et al.

A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely by taking the pairwise dialogue history and candidate response as a whole, the hierarchical information on either utterance interrelation or speaker roles coupled in such representations is not well addressed. In this work, we propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history. In detail, we decouple the contextualized word representations by masking mechanisms in Transformer-based PrLM, making each word only focus on the words in current utterance, other utterances, two speaker roles (i.e., utterances of sender and utterances of receiver), respectively. Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method.

0.3CLSep 14, 2020

Composing Answer from Multi-spans for Reading Comprehension

Zhuosheng Zhang, Yiqing Zhang, Hai Zhao et al.

This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident $n$-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.

3.3OCMar 7, 2020

Stochastic Modified Equations for Continuous Limit of Stochastic ADMM

Xiang Zhou, Huizhuo Yuan, Chris Junchi Li et al.

Stochastic version of alternating direction method of multiplier (ADMM) and its variants (linearized ADMM, gradient-based ADMM) plays a key role for modern large scale machine learning problems. One example is the regularized empirical risk minimization problem. In this work, we put different variants of stochastic ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach. We adapt the mathematical framework of stochastic modified equation (SME), and show that the dynamics of stochastic ADMM is approximated by a class of stochastic differential equations with small noise parameters in the sense of weak approximation. The continuous-time analysis would uncover important analytical insights into the behaviors of the discrete-time algorithm, which are non-trivial to gain otherwise. For example, we could characterize the fluctuation of the solution paths precisely, and decide optimal stopping time to minimize the variance of solution paths.

12.8CLSep 5, 2019Code

Semantics-aware BERT for Language Understanding

Zhuosheng Zhang, Yuwei Wu, Hai Zhao et al.

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

4.8CLJan 27, 2019

Dual Co-Matching Network for Multi-choice Reading Comprehension

Shuailiang Zhang, Hai Zhao, Yuwei Wu et al.

Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure. Given passage and question, a correct answer need to be selected from a set of candidate answers. In this paper, we propose \textbf{D}ual \textbf{C}o-\textbf{M}atching \textbf{N}etwork (\textbf{DCMN}) which model the relationship among passage, question and answer bidirectionally. Different from existing approaches which only calculate question-aware or option-aware passage representation, we calculate passage-aware question representation and passage-aware answer representation at the same time. To demonstrate the effectiveness of our model, we evaluate our model on a large-scale multiple choice machine reading comprehension dataset (i.e. RACE). Experimental result show that our proposed model achieves new state-of-the-art results.

6.5CLJan 16, 2019

Dependency or Span, End-to-End Uniform Semantic Role Labeling

Zuchao Li, Shexia He, Hai Zhao et al.

Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.

1.2NAAug 24, 2017

Multiscale Gentlest Ascent Dynamics for Saddle Point in Effective Dynamics of Slow-Fast System

Shuting Gu, Xiang Zhou

Here we present a multiscale method to calculate the saddle point associated with the effective dynamics arising from a stochastic system which couples slow deterministic drift and fast stochastic dynamics. This problem is motivated by the transition states on free energy surfaces in chemical physics. Our method is based on the gentlest ascent dynamics which couples the position variable and the direction variable and has the local convergence to saddle points. The dynamics of the direction vector is derived in terms of the covariance function with respective to the equilibrium distribution of the fast stochastic process. We apply the multiscale numerical methods to efficiently solve the obtained multiscale gentlest ascent dynamics, {and discuss the acceleration techniques based on the adaptive idea.} The examples of stochastic ordinary and partial differential equations are presented.