A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity
For clinicians and researchers studying rare immune diseases, this work provides a data processing and ML methodology to extract IEI patterns from complex EHR data, though it is an incremental application of existing clustering techniques.
The authors developed a pipeline combining data curation and unsupervised clustering to identify patterns of inborn errors of immunity (IEI) from a national EHR registry, enabling feature extraction and pattern recognition for rare diseases.
Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine data driven analyses to remain on the forefront of IEI and other rare disease trends. Development of machine learning (ML) algorithms in IEI for pattern recognition as well as published methodology examining how to systematically process and integrate complex medical data is limited. Our proposed pipeline, including data curation and ML clustering algorithms, is designed to recognize novel rare disease patterns and extract IEI- associated features from a national data registry. Our methodology for EHR data formatting and processing presents the pipeline that transforms raw immunologic lab data into vectors. This is further combined with hyperparameter tuning for diseases pattern recognition via clustering. This study refines IEI feature awareness, develops data tool kits for rare disease populations analysis, and expands on transforming complex medical records in data structures interpretable by unsupervised ML.