Chao Zhao

CL
h-index49
26papers
4,016citations
Novelty47%
AI Score51

26 Papers

CLMar 19, 2022Code
Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension

Chao Zhao, Wenlin Yao, Dian Yu et al.

Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations. Therefore, dialogue comprehension requires diverse capabilities such as paraphrasing, summarizing, and commonsense reasoning. Towards the objective of pre-training a zero-shot dialogue comprehension model, we develop a novel narrative-guided pre-training strategy that learns by narrating the key information from a dialogue input. However, the dialogue-narrative parallel corpus for such a pre-training strategy is currently unavailable. For this reason, we first construct a dialogue-narrative parallel corpus by automatically aligning movie subtitles and their synopses. We then pre-train a BART model on the data and evaluate its performance on four dialogue-based tasks that require comprehension. Experimental results show that our model not only achieves superior zero-shot performance but also exhibits stronger fine-grained dialogue comprehension capabilities. The data and code are available at https://github.com/zhaochaocs/Diana

CLMar 19, 2022
Read Top News First: A Document Reordering Approach for Multi-Document News Summarization

Chao Zhao, Tenghao Huang, Somnath Basu Roy Chowdhury et al. · amazon-science

A common method for extractive multi-document news summarization is to re-formulate it as a single-document summarization problem by concatenating all documents as a single meta-document. However, this method neglects the relative importance of documents. We propose a simple approach to reorder the documents according to their relative importance before concatenating and summarizing them. The reordering makes the salient content easier to learn by the summarization model. Experiments show that our approach outperforms previous state-of-the-art methods with more complex architectures.

CLDec 2, 2022Code
NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization

Chao Zhao, Faeze Brahman, Kaiqiang Song et al.

Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.

CLMar 15, 2022Code
Unsupervised Extractive Opinion Summarization Using Sparse Coding

Somnath Basu Roy Chowdhury, Chao Zhao, Snigdha Chaturvedi

Opinion summarization is the task of automatically generating summaries that encapsulate information from multiple user reviews. We present Semantic Autoencoder (SemAE) to perform extractive opinion summarization in an unsupervised manner. SemAE uses dictionary learning to implicitly capture semantic information from the review and learns a latent representation of each sentence over semantic units. A semantic unit is supposed to capture an abstract semantic concept. Our extractive summarization algorithm leverages the representations to identify representative opinions among hundreds of reviews. SemAE is also able to perform controllable summarization to generate aspect-specific summaries. We report strong performance on SPACE and AMAZON datasets, and perform experiments to investigate the functioning of our model. Our code is publicly available at https://github.com/brcsomnath/SemAE.

MES-HALLJun 22, 2023
Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots

Chao Shen, Wenkang Zhan, Kaiyao Xin et al.

Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for developing various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Here, we report a real-time feedback control method to realize the growth of QDs with arbitrary density, which is fully automated and intelligent. We developed a machine learning (ML) model named 3D ResNet 50 trained using reflection high-energy electron diffraction (RHEED) videos as input instead of static images and providing real-time feedback on surface morphologies for process control. As a result, we demonstrated that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5E10 cm-2 down to 3.8E8 cm-2 or up to 1.4E11 cm-2. Compared to traditional methods, our approach, with in situ tuning capabilities and excellent reliability, can dramatically expedite the material optimization process and improve the reproducibility of MBE, constituting significant progress for thin film growth techniques. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for optoelectronic and microelectronic industries.

LGAug 15, 2024Code
Causal Discovery from Time-Series Data with Short-Term Invariance-Based Convolutional Neural Networks

Rujia Shen, Boran Wang, Chao Zhao et al.

Causal discovery from time-series data aims to capture both intra-slice (contemporaneous) and inter-slice (time-lagged) causality between variables within the temporal chain, which is crucial for various scientific disciplines. Compared to causal discovery from non-time-series data, causal discovery from time-series data necessitates more serialized samples with a larger amount of observed time steps. To address the challenges, we propose a novel gradient-based causal discovery approach STIC, which focuses on \textbf{S}hort-\textbf{T}erm \textbf{I}nvariance using \textbf{C}onvolutional neural networks to uncover the causal relationships from time-series data. Specifically, STIC leverages both the short-term time and mechanism invariance of causality within each window observation, which possesses the property of independence, to enhance sample efficiency. Furthermore, we construct two causal convolution kernels, which correspond to the short-term time and mechanism invariance respectively, to estimate the window causal graph. To demonstrate the necessity of convolutional neural networks for causal discovery from time-series data, we theoretically derive the equivalence between convolution and the underlying generative principle of time-series data under the assumption that the additive noise model is identifiable. Experimental evaluations conducted on both synthetic and FMRI benchmark datasets demonstrate that our STIC outperforms baselines significantly and achieves the state-of-the-art performance, particularly when the datasets contain a limited number of observed time steps. Code is available at \url{https://github.com/HITshenrj/STIC}.

CLMay 26, 2022
Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach

Chao Zhao, Faeze Brahman, Tenghao Huang et al.

Pre-trained models (PTMs) have lead to great improvements in natural language generation (NLG). However, it is still unclear how much commonsense knowledge they possess. With the goal of evaluating commonsense knowledge of NLG models, recent work has proposed the problem of generative commonsense reasoning, e.g., to compose a logical sentence given a set of unordered concepts. Existing approaches to this problem hypothesize that PTMs lack sufficient parametric knowledge for this task, which can be overcome by introducing external knowledge or task-specific pre-training objectives. Different from this trend, we argue that PTM's inherent ability for generative commonsense reasoning is underestimated due to the order-agnostic property of its input. In particular, we hypothesize that the order of the input concepts can affect the PTM's ability to utilize its commonsense knowledge. To this end, we propose a pre-ordering approach to elaborately manipulate the order of the given concepts before generation. Experiments show that our approach can outperform the more sophisticated models that have access to a lot of external data and resources.

MTRL-SCIAug 7, 2024
On-Demand Growth of Semiconductor Heterostructures Guided by Physics-Informed Machine Learning

Chao Shen, Yuan Li, Wenkang Zhan et al.

Developing tailored semiconductor heterostructures on demand represents a critical capability for addressing the escalating performance demands in electronic and optoelectronic devices. However, traditional fabrication methods remain constrained by simulation-based design and iterative trial-and-error optimization. Here, we introduce SemiEpi, a self-driving platform designed for molecular beam epitaxy (MBE) to perform multi-step semiconductor heterostructure growth through in-situ monitoring and on-the-fly feedback control. By integrating standard MBE reactors, physics-informed machine learning (ML) models, and parameter initialization, SemiEpi identifies optimal initial conditions and proposes experiments for heterostructure growth, eliminating the need for extensive expertise in MBE processes. As a proof of concept, we demonstrate the optimization of high-density InAs quantum dot (QD) growth with a target emission wavelength of 1240 nm, showcasing the power of SemiEpi. We achieve a QD density of 5 x 10^10 cm^-2, a 1.6-fold increase in photoluminescence (PL) intensity, and a reduced full width at half maximum (FWHM) of 29.13 meV, leveraging in-situ reflective high-energy electron diffraction monitoring with feedback control for adjusting growth temperatures. Taken together, our results highlight the potential of ML-guided systems to address challenges in multi-step heterostructure growth, facilitate the development of a hardware-independent framework, and enhance process repeatability and stability, even without exhaustive knowledge of growth parameters.

HCMar 2
From Sustainable Materials to User-Centered Sustainability: Material Experience in Art Healing

Yuxin Zhang, Fan Zhang, Zihao Song et al.

This study develops sustainable materials using hydrogel as the matrix and explores the transition from sustainable materials to user-centered sustainability, with a particular focus on achieving art healing through material experience. The findings reveal that "Aesthetic" property exert the greatest influence on art healing in the context of multimodal material experiences involving visual, tactile, and smell, followed by "Intrinsic" property, whereas "Physical" property have a comparatively limited effect. Furthermore, the study proposes a material experience framework that enables designers to systematically and holistically understanding material characteristics. It highlights the importance of considering users' psychological perceptions and emotional needs in the material design process.

CLFeb 13
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens

Wei-Lin Chen, Liqian Peng, Tian Tan et al.

Large language models (LLMs) have demonstrated impressive reasoning capabilities by scaling test-time compute via long Chain-of-Thought (CoT). However, recent findings suggest that raw token counts are unreliable proxies for reasoning quality: increased generation length does not consistently correlate with accuracy and may instead signal "overthinking," leading to performance degradation. In this work, we quantify inference-time effort by identifying deep-thinking tokens -- tokens where internal predictions undergo significant revisions in deeper model layers prior to convergence. Across four challenging mathematical and scientific benchmarks (AIME 24/25, HMMT 25, and GPQA-diamond) and a diverse set of reasoning-focused models (GPT-OSS, DeepSeek-R1, and Qwen3), we show that deep-thinking ratio (the proportion of deep-thinking tokens in a generated sequence) exhibits a robust and consistently positive correlation with accuracy, substantially outperforming both length-based and confidence-based baselines. Leveraging this insight, we introduce Think@n, a test-time scaling strategy that prioritizes samples with high deep-thinking ratios. We demonstrate that Think@n matches or exceeds standard self-consistency performance while significantly reducing inference costs by enabling the early rejection of unpromising generations based on short prefixes.

CLMay 20, 2023Code
"What do others think?": Task-Oriented Conversational Modeling with Subjective Knowledge

Chao Zhao, Spandana Gella, Seokhwan Kim et al.

Task-oriented Dialogue (TOD) Systems aim to build dialogue systems that assist users in accomplishing specific goals, such as booking a hotel or a restaurant. Traditional TODs rely on domain-specific APIs/DBs or external factual knowledge to generate responses, which cannot accommodate subjective user requests (e.g., "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?"). To address this issue, we propose a novel task of subjective-knowledge-based TOD (SK-TOD). We also propose the first corresponding dataset, which contains subjective knowledge-seeking dialogue contexts and manually annotated responses grounded in subjective knowledge sources. When evaluated with existing TOD approaches, we find that this task poses new challenges such as aggregating diverse opinions from multiple knowledge snippets. We hope this task and dataset can promote further research on TOD and subjective content understanding. The code and the dataset are available at https://github.com/alexa/dstc11-track5.

30.3ROMar 14
LDHP: Library-Driven Hierarchical Planning for Non-prehensile Dexterous Manipulation

Tierui He, Chao Zhao

Non-prehensile manipulation is essential for handling thin, large, or otherwise ungraspable objects in unstructured settings. Prior planning and search-based methods often rely on ad-hoc manual designs or generate physically unrealizable motions by ignoring critical gripper properties, while training-based approaches are data-intensive and struggle to generalize to novel, out-of-distribution tasks. We propose a library-driven hierarchical planner (LDHP) that makes executability a first-class design goal: a top-tier contact-state planner proposes object-pose paths using MoveObject primitives, and a bottom-tier grasp planner synthesizes feasible grasp sequences with AdjustGrasp primitives; feasibility is certified by collision checks and quasi-static mechanics, and contact-sensitive segments are recovered via a bounded dichotomy refinement. This gripper-aware decomposition decouples object motion from grasp realizability, yields a task-agnostic pipeline that transfers across manipulation tasks and geometric variations without re-design, and exposes clean hooks for optional learned priors. Real-robot studies on zero-mobility lifting and slot insertion demonstrate consistent execution and robustness to shape and environment changes.

CLMar 31, 2024
Returning to the Start: Generating Narratives with Related Endpoints

Anneliese Brei, Chao Zhao, Snigdha Chaturvedi

Human writers often bookend their writing with ending sentences that relate back to the beginning sentences in order to compose a satisfying narrative that "closes the loop." Motivated by this observation, we propose RENarGen, a controllable story-generation paradigm that generates narratives by ensuring the first and last sentences are related and then infilling the middle sentences. Our contributions include an initial exploration of how various methods of bookending from Narratology affect language modeling for stories. Automatic and human evaluations indicate RENarGen produces better stories with more narrative closure than current autoregressive models.

MES-HALLNov 1, 2024
In-situ Self-optimization of Quantum Dot Emission for Lasers by Machine-Learning Assisted Epitaxy

Chao Shen, Wenkang Zhan, Shujie Pan et al.

Traditional methods for optimizing light source emissions rely on a time-consuming trial-and-error approach. While in-situ optimization of light source gain media emission during growth is ideal, it has yet to be realized. In this work, we integrate in-situ reflection high-energy electron diffraction (RHEED) with machine learning (ML) to correlate the surface reconstruction with the photoluminescence (PL) of InAs/GaAs quantum dots (QDs), which serve as the active region of lasers. A lightweight ResNet-GLAM model is employed for the real-time processing of RHEED data as input, enabling effective identification of optical performance. This approach guides the dynamic optimization of growth parameters, allowing real-time feedback control to adjust the QDs emission for lasers. We successfully optimized InAs QDs on GaAs substrates, with a 3.2-fold increase in PL intensity and a reduction in full width at half maximum (FWHM) from 36.69 meV to 28.17 meV under initially suboptimal growth conditions. Our automated, in-situ self-optimized lasers with 5-layer InAs QDs achieved electrically pumped continuous-wave operation at 1240 nm with a low threshold current of 150 A/cm2 at room temperature, an excellent performance comparable to samples grown through traditional manual multi-parameter optimization methods. These results mark a significant step toward intelligent, low-cost, and reproductive light emitters production.

MES-HALLDec 4, 2023
Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control

Chao Shen, Wenkang Zhan, Jian Tang et al.

Thin film deposition is an essential step in the semiconductor process. During preparation or loading, the substrate is exposed to the air unavoidably, which has motivated studies of the process control to remove the surface oxide before thin film deposition. Optimizing the deoxidation process in molecular beam epitaxy (MBE) for a random substrate is a multidimensional challenge and sometimes controversial. Due to variations in semiconductor materials and growth processes, the determination of substrate deoxidation temperature is highly dependent on the grower's expertise; the same substrate may yield inconsistent results when evaluated by different growers. Here, we employ a machine learning (ML) hybrid convolution and vision transformer (CNN-ViT) model. This model utilizes reflection high-energy electron diffraction (RHEED) video as input to determine the deoxidation status of the substrate as output, enabling automated substrate deoxidation under a controlled architecture. This also extends to the successful application of deoxidation processes on other substrates. Furthermore, we showcase the potential of models trained on data from a single MBE equipment to achieve high-accuracy deployment on other equipment. In contrast to traditional methods, our approach holds exceptional practical value. It standardizes deoxidation temperatures across various equipment and substrate materials, advancing the standardization research process in semiconductor preparation, a significant milestone in thin film growth technology. The concepts and methods demonstrated in this work are anticipated to revolutionize semiconductor manufacturing in optoelectronics and microelectronics industries by applying them to diverse material growth processes.

CLJan 31, 2024
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization

Sangwoo Cho, Kaiqiang Song, Chao Zhao et al. · tencent-ai

Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations. Traditional language models often overlook the distinct features of these dialogues by treating them as regular text. In this paper, we propose a speaker-enhanced pre-training method for long dialogue summarization, which leverages the inherent structure of multiple-turn dialogues. To support our study, we curate a diverse dataset that includes transcripts from real-world scenarios, movie or TV show transcripts, and dialogues generated by a Large Language Model. We then perform a pre-training, which encompasses the detection of speaker changes, and masked utterance generation. Experimental results of fine-tuned models demonstrate that our model achieves state-of-the-art performance on downstream benchmarks with long context, surpassing baseline models and highlighting the effectiveness of our approach. Our findings highlight the importance of curating pre-training datasets that exhibit diversity and variations in length distribution to ensure effective alignment with downstream datasets.

CLJan 13, 2022
Document-level Relation Extraction with Context Guided Mention Integration and Inter-pair Reasoning

Chao Zhao, Daojian Zeng, Lu Xu et al.

Document-level Relation Extraction (DRE) aims to recognize the relations between two entities. The entity may correspond to multiple mentions that span beyond sentence boundary. Few previous studies have investigated the mention integration, which may be problematic because coreferential mentions do not equally contribute to a specific relation. Moreover, prior efforts mainly focus on reasoning at entity-level rather than capturing the global interactions between entity pairs. In this paper, we propose two novel techniques, Context Guided Mention Integration and Inter-pair Reasoning (CGM2IR), to improve the DRE. Instead of simply applying average pooling, the contexts are utilized to guide the integration of coreferential mentions in a weighted sum manner. Additionally, inter-pair reasoning executes an iterative algorithm on the entity pair graph, so as to model the interdependency of relations. We evaluate our CGM2IR model on three widely used benchmark datasets, namely DocRED, CDR, and GDA. Experimental results show that our model outperforms previous state-of-the-art models.

CLSep 12, 2021
"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding

Faeze Brahman, Meng Huang, Oyvind Tafjord et al.

When reading a literary piece, readers often make inferences about various characters' roles, personalities, relationships, intents, actions, etc. While humans can readily draw upon their past experiences to build such a character-centric view of the narrative, understanding characters in narratives can be a challenging task for machines. To encourage research in this field of character-centric narrative understanding, we present LiSCU -- a new dataset of literary pieces and their summaries paired with descriptions of characters that appear in them. We also introduce two new tasks on LiSCU: Character Identification and Character Description Generation. Our experiments with several pre-trained language models adapted for these tasks demonstrate that there is a need for better models of narrative comprehension.

CLNov 22, 2019
Weakly-Supervised Opinion Summarization by Leveraging External Information

Chao Zhao, Snigdha Chaturvedi

Opinion summarization from online product reviews is a challenging task, which involves identifying opinions related to various aspects of the product being reviewed. While previous works require additional human effort to identify relevant aspects, we instead apply domain knowledge from external sources to automatically achieve the same goal. This work proposes AspMem, a generative method that contains an array of memory cells to store aspect-related knowledge. This explicit memory can help obtain a better opinion representation and infer the aspect information more precisely. We evaluate this method on both aspect identification and opinion summarization tasks. Our experiments show that AspMem outperforms the state-of-the-art methods even though, unlike the baselines, it does not rely on human supervision which is carefully handcrafted for the given tasks.

ROAug 10, 2019
Deep Dexterous Grasping of Novel Objects from a Single View

Umit Rusen Aktas, Chao Zhao, Marek Kopicki et al.

Dexterous grasping of a novel object given a single view is an open problem. This paper makes several contributions to its solution. First, we present a simulator for generating and testing dexterous grasps. Second we present a data set, generated by this simulator, of 2.4 million simulated dexterous grasps of variations of 294 base objects drawn from 20 categories. Third, we present a basic architecture for generation and evaluation of dexterous grasps that may be trained in a supervised manner. Fourth, we present three different evaluative architectures, employing ResNet-50 or VGG16 as their visual backbone. Fifth, we train, and evaluate seventeen variants of generative-evaluative architectures on this simulated data set, showing improvement from 69.53% grasp success rate to 90.49%. Finally, we present a real robot implementation and evaluate the four most promising variants, executing 196 real robot grasps in total. We show that our best architectural variant achieves a grasp success rate of 87.8% on real novel objects seen from a single view, improving on a baseline of 57.1%.

ITMay 30, 2019
Neural Entropic Estimation: A faster path to mutual information estimation

Chung Chan, Ali Al-Bashabsheh, Hing Pang Huang et al.

We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence.

AINov 28, 2017
Classification of entities via their descriptive sentences

Chao Zhao, Min Zhao, Yi Guan

Hypernym identification of open-domain entities is crucial for taxonomy construction as well as many higher-level applications. Current methods suffer from either low precision or low recall. To decrease the difficulty of this problem, we adopt a classification-based method. We pre-define a concept taxonomy and classify an entity to one of its leaf concept, based on the name and description information of the entity. A convolutional neural network classifier and a K-means clustering module are adopted for classification. We applied this system to 2.1 million Baidu Baike entities, and 1.1 million of them were successfully identified with a precision of 99.36%.

CLSep 20, 2017
Constructing a Hierarchical User Interest Structure based on User Profiles

Chao Zhao, Min Zhao, Yi Guan

The interests of individual internet users fall into a hierarchical structure which is useful in regards to building personalized searches and recommendations. Most studies on this subject construct the interest hierarchy of a single person from the document perspective. In this study, we constructed the user interest hierarchy via user profiles. We organized 433,397 user interests, referred to here as "attentions", into a user attention network (UAN) from 200 million user profiles; we then applied the Louvain algorithm to detect hierarchical clusters in these attentions. Finally, a 26-level hierarchy with 34,676 clusters was obtained. We found that these attention clusters were aggregated according to certain topics as opposed to the hyponymy-relation based conceptual ontologies. The topics can be entities or concepts, and the relations were not restrained by hyponymy. The concept relativity encapsulated in the user's interest can be captured by labeling the attention clusters with corresponding concepts.

AISep 20, 2017
EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning

Chao Zhao, Jingchi Jiang, Yi Guan

Objective: Electronic medical records (EMRs) contain an amount of medical knowledge which can be used for clinical decision support (CDS). Our objective is a general system that can extract and represent these knowledge contained in EMRs to support three CDS tasks: test recommendation, initial diagnosis, and treatment plan recommendation, with the given condition of one patient. Methods: We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a single record. Three bipartite subgraphs (bi-graphs) were extracted from the EMKN to support each task. One part of the bi-graph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bi-graph was regarded as a Markov random field to support the inference. Three lazy energy functions and one parameter-based energy function were proposed, as well as two knowledge representation learning-based energy functions, which can provide a distributed representation of medical entities. Three measures were utilized for performance evaluation. Results: On the initial diagnosis task, 80.11% of the test records identified at least one correct disease from top 10 candidates. Test and treatment recommendation results were 87.88% and 92.55%, respectively. These results altogether indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships in regards to knowledge level. Conclusion: Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require designing their energy functions individually.

CLSep 20, 2017
De-identification of medical records using conditional random fields and long short-term memory networks

Zhipeng Jiang, Chao Zhao, Bin He et al.

The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F_1 measure of 89.86%, which was higher than that of the CRF-based system.

AIMar 28, 2017
Learning and inference in knowledge-based probabilistic model for medical diagnosis

Jingchi Jiang, Chao Zhao, Yi Guan et al.

Based on a weighted knowledge graph to represent first-order knowledge and combining it with a probabilistic model, we propose a methodology for the creation of a medical knowledge network (MKN) in medical diagnosis. When a set of symptoms is activated for a specific patient, we can generate a ground medical knowledge network composed of symptom nodes and potential disease nodes. By Incorporating a Boltzmann machine into the potential function of a Markov network, we investigated the joint probability distribution of the MKN. In order to deal with numerical symptoms, a multivariate inference model is presented that uses conditional probability. In addition, the weights for the knowledge graph were efficiently learned from manually annotated Chinese Electronic Medical Records (CEMRs). In our experiments, we found numerically that the optimum choice of the quality of disease node and the expression of symptom variable can improve the effectiveness of medical diagnosis. Our experimental results comparing a Markov logic network and the logistic regression algorithm on an actual CEMR database indicate that our method holds promise and that MKN can facilitate studies of intelligent diagnosis.