CLJul 6, 2020Code
DART: Open-Domain Structured Data Record to Text GenerationLinyong Nan, Dragomir Radev, Rui Zhang et al.
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.
CLJul 7, 2021
Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a ReviewIrene Li, Jessica Pan, Jeremy Goldwasser et al.
Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue, multilinguality, interpretability, etc.
LGOct 22, 2020
Flame Stability Analysis of Flame Spray Pyrolysis by Artificial IntelligenceJessica Pan, Joseph A. Libera, Noah H. Paulson et al.
Flame spray pyrolysis (FSP) is a process used to synthesize nanoparticles through the combustion of an atomized precursor solution; this process has applications in catalysts, battery materials, and pigments. Current limitations revolve around understanding how to consistently achieve a stable flame and the reliable production of nanoparticles. Machine learning and artificial intelligence algorithms that detect unstable flame conditions in real time may be a means of streamlining the synthesis process and improving FSP efficiency. In this study, the FSP flame stability is first quantified by analyzing the brightness of the flame's anchor point. This analysis is then used to label data for both unsupervised and supervised machine learning approaches. The unsupervised learning approach allows for autonomous labelling and classification of new data by representing data in a reduced dimensional space and identifying combinations of features that most effectively cluster it. The supervised learning approach, on the other hand, requires human labeling of training and test data, but is able to classify multiple objects of interest (such as the burner and pilot flames) within the video feed. The accuracy of each of these techniques is compared against the evaluations of human experts. Both the unsupervised and supervised approaches can track and classify FSP flame conditions in real time to alert users of unstable flame conditions. This research has the potential to autonomously track and manage flame spray pyrolysis as well as other flame technologies by monitoring and classifying the flame stability.