Xiaoshan Li

SE
9papers
448citations
Novelty27%
AI Score21

9 Papers

CVOct 15, 2021Code
Performance, Successes and Limitations of Deep Learning Semantic Segmentation of Multiple Defects in Transmission Electron Micrographs

Ryan Jacobs, Mingren Shen, Yuhan Liu et al.

In this work, we perform semantic segmentation of multiple defect types in electron microscopy images of irradiated FeCrAl alloys using a deep learning Mask Regional Convolutional Neural Network (Mask R-CNN) model. We conduct an in-depth analysis of key model performance statistics, with a focus on quantities such as predicted distributions of defect shapes, defect sizes, and defect areal densities relevant to informing modeling and understanding of irradiated Fe-based materials properties. To better understand the performance and present limitations of the model, we provide examples of useful evaluation tests which include a suite of random splits, and dataset size-dependent and domain-targeted cross validation tests. Overall, we find that the current model is a fast, effective tool for automatically characterizing and quantifying multiple defect types in microscopy images, with a level of accuracy on par with human domain expert labelers. More specifically, the model can achieve average defect identification F1 scores as high as 0.8, and, based on random cross validation, have low overall average (+/- standard deviation) defect size and density percentage errors of 7.3 (+/- 3.8)% and 12.7 (+/- 5.3)%, respectively. Further, our model predicts the expected material hardening to within 10-20 MPa (about 10% of total hardening), which is about the same error level as experiments. Our targeted evaluation tests also suggest the best path toward improving future models is not expanding existing databases with more labeled images but instead data additions that target weak points of the model domain, such as images from different microscopes, imaging conditions, irradiation environments, and alloy types. Finally, we discuss the first phase of an effort to provide an easy-to-use, open-source object detection tool to the broader community for identifying defects in new images.

LGDec 21, 2018
An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter Prediction

Zhuyifan Ye, Yilong Yang, Xiaoshan Li et al.

Background: Pharmacokinetic evaluation is one of the key processes in drug discovery and development. However, current absorption, distribution, metabolism, excretion prediction models still have limited accuracy. Aim: This study aims to construct an integrated transfer learning and multitask learning approach for developing quantitative structure-activity relationship models to predict four human pharmacokinetic parameters. Methods: A pharmacokinetic dataset included 1104 U.S. FDA approved small molecule drugs. The dataset included four human pharmacokinetic parameter subsets (oral bioavailability, plasma protein binding rate, apparent volume of distribution at steady-state and elimination half-life). The pre-trained model was trained on over 30 million bioactivity data. An integrated transfer learning and multitask learning approach was established to enhance the model generalization. Results: The pharmacokinetic dataset was split into three parts (60:20:20) for training, validation and test by the improved Maximum Dissimilarity algorithm with the representative initial set selection algorithm and the weighted distance function. The multitask learning techniques enhanced the model predictive ability. The integrated transfer learning and multitask learning model demonstrated the best accuracies, because deep neural networks have the general feature extraction ability, transfer learning and multitask learning improved the model generalization. Conclusions: The integrated transfer learning and multitask learning approach with the improved dataset splitting algorithm was firstly introduced to predict the pharmacokinetic parameters. This method can be further employed in drug discovery and development.

LGSep 6, 2018
Deep learning for in vitro prediction of pharmaceutical formulations

Yilong Yang, Zhuyifan Ye, Yan Su et al.

Current pharmaceutical formulation development still strongly relies on the traditional trial-and-error approach by individual experiences of pharmaceutical scientists, which is laborious, time-consuming and costly. Recently, deep learning has been widely applied in many challenging domains because of its important capability of automatic feature extraction. The aim of this research is to use deep learning to predict pharmaceutical formulations. In this paper, two different types of dosage forms were chosen as model systems. Evaluation criteria suitable for pharmaceutics were applied to assessing the performance of the models. Moreover, an automatic dataset selection algorithm was developed for selecting the representative data as validation and test datasets. Six machine learning methods were compared with deep learning. The result shows the accuracies of both two deep neural networks were above 80% and higher than other machine learning models, which showed good prediction in pharmaceutical formulations. In summary, deep learning with the automatic data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data was firstly developed for the prediction of pharmaceutical formulations. The cross-disciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical researches from experience-dependent studies to data-driven methodologies.

SEAug 31, 2018
Automated Prototype Generation from Formal Requirements Model

Yilong Yang, Xiaoshan Li, Zhiming Liu et al.

Prototyping is an effective and efficient way of requirement validation to avoid introducing errors in the early stage of software development. However, manually developing a prototype of a software system requires additional efforts, which would increase the overall cost of software development. In this paper, we present an approach with a developed tool to automatic generation of prototypes from formal requirements models. A requirements model consists of a use case diagram, a conceptual class diagram, use case definitions specified by system sequence diagrams and the contracts of their system operations. We propose a method to decompose a contract into executable parts and non-executable parts. A set of transformation rules is given to decompose the executable part into pre-implemented primitive operations. A non-executable part is usually realized by significant algorithms such as sorting a list, finding the shortest path or domain-specific computation. It can be implemented manually or by using existing code. A CASE tool is developed that provides an interface for developers to develop a program for each non-executable part of a contract, and automatically transforms the executables into sequences of pre-implemented primitive operations. We have conducted four cases studies with over 50 use cases. The experimental result shows that the 93.65% of requirement specifications are executable, and only 6.35% are non-executable such as sorting and event-call, which can be implemented by developers manually or invoking the APIs of advanced algorithms in Java library. The one second generated the prototype of a case study requires approximate nine hours manual implementation by a skilled programmer. Overall, the result is satisfiable, and the proposed approach with the developed CASE tool can be applied to the software industry for requirements engineering.

SEJun 6, 2018
MicroShare: Privacy-Preserved Medical Resource Sharing through MicroService Architecture

Yilong Yang, Quan Zu, Peng Liu et al.

This paper takes up the problem of medical resource sharing through MicroService architecture without compromising patient privacy. To achieve this goal, we suggest refactoring the legacy EHR systems into autonomous MicroServices communicating by the unified techniques such as RESTFul web service. This lets us handle clinical data queries directly and far more efficiently for both internal and external queries. The novelty of the proposed approach lies in avoiding the data de-identification process often used as a means of preserving patient privacy. The implemented toolkit combines software engineering technologies such as Java EE, RESTful web services, JSON Web Tokens to allow exchanging medical data in an unidentifiable XML and JSON format as well as restricting users to the need-to-know principle. Our technique also inhibits retrospective processing of data such as attacks by an adversary on a medical dataset using advanced computational methods to reveal Protected Health Information (PHI). The approach is validated on an endoscopic reporting application based on openEHR and MST standards. From the usability perspective, the approach can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care and medical record services to improve quality of care and treatment.

SEMar 14, 2018
Integrating UML with Service Refinement for Requirements Modeling and Analysis

Yilong Yang, Jing Yang, Xiaoshan Li

Unified Modeling Language (UML) is the de facto standard for requirements modeling and system design. UML as a visual language can tremendously help customers, project managers, and developers to specify the requirements of a target system. However, UML lacks the ability to specify the requirements precisely such as the contracts of the system operation, and verify the consistency and refinement of the requirements. These disadvantages result in that the potential faults of software are hard to be discovered in the early stage of software development process, and then requiring more efforts in software testing to find the bugs. Service refinement is a formal method, which could be a supplement to enhance the UML. In this paper, we show how to integrate UML with service refinement to specify requirements, and verify the consistency and refinements of the requirements through a case study of online shopping system. Particularly, requirements are modeled through UML diagrams, which includes a) use case diagram, b) system sequence diagrams and c) conceptual class diagram. Service refinement enhances the requirements model by introducing the contracts. Furthermore, the consistency and refinements of requirement model can be verified through service refinement. Our approach demonstrates integrating UML with service refinement can require fewer efforts to achieve the consistency requirements than only using UML for requirement modeling.

SEMar 14, 2018
Real-time System Modeling and Verification through Labeled Transition System Analyser (LTSA)

Yilong Yang, Xiaoshan Li, Quan Zu

With the advancement of software engineering in recent years, the model checking techniques are widely applied in various areas to do the verification for the system model. However, it is difficult to apply the model checking to verify requirements due to lacking the details of the design. Unlike other model checking tools, LTSA provides the structure diagram, which can bridge the gap between the requirements and the design. In this paper, we demonstrate the abilities of LTSA shipped with the classic case study of the steam boiler system. The structure diagram of LTSA can specify the interactions between the controller and the steam boiler, which can be derived from UML requirements model such as system sequence diagram of the steam boiler system. The start-up design model of LTSA can be generated from the structure diagram. Furthermore, we provide a variation law of the steam rate to avoid the issue of state space explosion and show how explicitly and implicitly model the time that reflects the difference between system modeling and the physical world. Finally, the derived model is verified against the required properties. Our work demonstrates the potential power of integrating UML with model checking tools in requirement elicitation, system design, and verification.

SEMar 14, 2018
MedShare: Medical Resource Sharing among Autonomous Healthcare Providers

Yilong Yang, Xiaoshan Li, Nafees Qamar et al.

Legacy Electronic Health Records (EHRs) systems were not developed with the level of connectivity expected from them nowadays. Therefore, interoperability weakness inherent in the legacy systems can result in poor patient care and waste of financial resources. Large hospitals are less likely to share their data with external hospitals due to economic and political reasons. Motivated by these facts, we aim to provide a set of software implementation guidelines, i.e., MedShare to deal with interoperability issues among disconnected healthcare systems. The proposed integrated architecture includes: 1) a data extractor to fetch legacy medical data from a hemodialysis center, 2) converting it to a common data model, 3) indexing patient information using the HashMap technique, and 4) a set of services and tools that can be installed as a coherent environment on top of stand-alone EHRs systems. Our work enabled three cooperating but autonomous hospitals to mutually exchange medical data and helped them develop a common reference architecture. It lets stakeholders retain control over their patient data, winning the trust and confidence much needed towards a successful deployment of MedShare. Security concerns were effectively addressed that also included patient consent in the data exchange process. Thereby, the implemented toolset offered a collaborative environment to share EHRs by the healthcare providers.

MLMar 14, 2018
Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques

Run Han, Yilong Yang, Xiaoshan Li et al.

Oral Disintegrating Tablets (ODTs) is a novel dosage form that can be dissolved on the tongue within 3min or less especially for geriatric and pediatric patients. Current ODT formulation studies usually rely on the personal experience of pharmaceutical experts and trial-and-error in the laboratory, which is inefficient and time-consuming. The aim of current research was to establish the prediction model of ODT formulations with direct compression process by Artificial Neural Network (ANN) and Deep Neural Network (DNN) techniques. 145 formulation data were extracted from Web of Science. All data sets were divided into three parts: training set (105 data), validation set (20) and testing set (20). ANN and DNN were compared for the prediction of the disintegrating time. The accuracy of the ANN model has reached 85.60%, 80.00% and 75.00% on the training set, validation set and testing set respectively, whereas that of the DNN model was 85.60%, 85.00% and 80.00%, respectively. Compared with the ANN, DNN showed the better prediction for ODT formulations. It is the first time that deep neural network with the improved dataset selection algorithm is applied to formulation prediction on small data. The proposed predictive approach could evaluate the critical parameters about quality control of formulation, and guide research and process development. The implementation of this prediction model could effectively reduce drug product development timeline and material usage, and proactively facilitate the development of a robust drug product.