Shukai Wang

2papers

2 Papers

49.8ITApr 9
The Asymmetric Hamming Bidistance and Distributions over Binary Asymmetric Channels

Shukai Wang, Cuiling Fan, Chunming Tang et al.

The binary asymmetric channel is a model for practical communication systems where the error probabilities for symbol transitions $0\rightarrow 1$ and $1\rightarrow0$ differ substantially. In this paper, we introduce the notion of asymmetric Hamming bidistance (AHB) and its two-dimensional distribution, which separately captures directional discrepancies between codewords. This finer characterization enables a more discriminative analysis of decoding the error probabilities for maximum-likelihood decoding (MLD), particularly when conventional measures, such as weight distributions and existing discrepancy-based bounds, fail to distinguish code performance. Building on this concept, we derive a new upper bound on the average error probability for binary codes under MLD and show that, in general, it is incomparable with the two existing bounds derived by Cotardo and Ravagnani (IEEE Trans. Inf. Theory, 68 (5), 2022). To demonstrate its applicability, we compute the complete AHB distributions for several families of codes, including two-weight and three-weight projective codes (with the zero codeword removed) via strongly regular graphs and 3-class association schemes, as well as nonlinear codes constructed from symmetric balanced incomplete block designs (SBIBDs).

CLSep 9, 2021
Graphine: A Dataset for Graph-aware Terminology Definition Generation

Zequn Liu, Shukai Wang, Yiyang Gu et al.

Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graph-aware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of terminologies. We further demonstrated how Graphine can be used to evaluate pretrained language models, compare graph representation learning methods and predict sentence granularity. We envision Graphine to be a unique resource for definition generation and many other NLP tasks in biomedicine.