Naveed Afzal

IR
3papers
496citations
Novelty22%
AI Score19

3 Papers

LGJun 30, 2022
Modularity Optimization as a Training Criterion for Graph Neural Networks

Tsuyoshi Murata, Naveed Afzal

Graph convolution is a recent scalable method for performing deep feature learning on attributed graphs by aggregating local node information over multiple layers. Such layers only consider attribute information of node neighbors in the forward model and do not incorporate knowledge of global network structure in the learning task. In particular, the modularity function provides a convenient source of information about the community structure of networks. In this work we investigate the effect on the quality of learned representations by the incorporation of community structure preservation objectives of networks in the graph convolutional model. We incorporate the objectives in two ways, through an explicit regularization term in the cost function in the output layer and as an additional loss term computed via an auxiliary layer. We report the effect of community structure preserving terms in the graph convolutional architectures. Experimental evaluation on two attributed bibilographic networks showed that the incorporation of the community-preserving objective improves semi-supervised node classification accuracy in the sparse label regime.

IRAug 28, 2018
MedSTS: A Resource for Clinical Semantic Textual Similarity

Yanshan Wang, Naveed Afzal, Sunyang Fu et al.

The wide adoption of electronic health records (EHRs) has enabled a wide range of applications leveraging EHR data. However, the meaningful use of EHR data largely depends on our ability to efficiently extract and consolidate information embedded in clinical text where natural language processing (NLP) techniques are essential. Semantic textual similarity (STS) that measures the semantic similarity between text snippets plays a significant role in many NLP applications. In the general NLP domain, STS shared tasks have made available a huge collection of text snippet pairs with manual annotations in various domains. In the clinical domain, STS can enable us to detect and eliminate redundant information that may lead to a reduction in cognitive burden and an improvement in the clinical decision-making process. This paper elaborates our efforts to assemble a resource for STS in the medical domain, MedSTS. It consists of a total of 174,629 sentence pairs gathered from a clinical corpus at Mayo Clinic. A subset of MedSTS (MedSTS_ann) containing 1,068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity). We further analyzed the medical concepts in the MedSTS corpus, and tested four STS systems on the MedSTS_ann corpus. In the future, we will organize a shared task by releasing the MedSTS_ann corpus to motivate the community to tackle the real world clinical problems.

IRFeb 1, 2018
A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Yanshan Wang, Sijia Liu, Naveed Afzal et al.

Word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications as they provide vector representations of words capturing the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual resources (e.g., Wikipedia and biomedical articles) to train word embeddings and apply these word embeddings to downstream biomedical applications. However, there has been little work on evaluating the word embeddings trained from these resources.In this study, we provide an empirical evaluation of word embeddings trained from four different resources, namely clinical notes, biomedical publications, Wikipedia, and news. We performed the evaluation qualitatively and quantitatively. For the qualitative evaluation, we manually inspected five most similar medical words to a given set of target medical words, and then analyzed word embeddings through the visualization of those word embeddings. For the quantitative evaluation, we conducted both intrinsic and extrinsic evaluation. Based on the evaluation results, we can draw the following conclusions. First, the word embeddings trained on clinical notes and biomedical publications can capture the semantics of medical terms better, and find more relevant similar medical terms, and are closer to human experts' judgments, compared to these trained on Wikipedia and news. Second, there does not exist a consistent global ranking of word embedding quality for downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained on biomedical domain corpora do not necessarily have better performance than those trained on other general domain corpora for any downstream biomedical NLP tasks.