IVMar 16, 2023
Using Scalable Computer Vision to Automate High-throughput Semiconductor CharacterizationAlexander E. Siemenn, Eunice Aissi, Fang Sheng et al.
High-throughput materials synthesis methods have risen in popularity due to their potential to accelerate the design and discovery of novel functional materials, such as solution-processed semiconductors. After synthesis, key material properties must be measured and characterized to validate discovery and provide feedback to optimization cycles. However, with the boom in development of high-throughput synthesis tools that champion production rates up to $10^4$ samples per hour with flexible form factors, most sample characterization methods are either slow (conventional rates of $10^1$ samples per hour, approximately 1000x slower) or rigid (e.g., designed for standard-size microplates), resulting in a bottleneck that impedes the materials-design process. To overcome this challenge, we propose a set of automated material property characterization (autocharacterization) tools that leverage the adaptive, parallelizable, and scalable nature of computer vision to accelerate the throughput of characterization by 85x compared to the non-automated workflow. We demonstrate a generalizable composition mapping tool for high-throughput synthesized binary material systems as well as two scalable autocharacterization algorithms that (1) autonomously compute the band gap of 200 unique compositions in 6 minutes and (2) autonomously compute the degree of degradation in 200 unique compositions in 20 minutes, generating ultra-high compositional resolution trends of band gap and stability. We demonstrate that the developed band gap and degradation detection autocharacterization methods achieve 98.5% accuracy and 96.9% accuracy, respectively, on the FA$_{1-x}$MA$_{x}$PbI$_3$, $0\leq x \leq 1$ perovskite semiconductor system.
SPMar 17, 2025Code
Survival Analysis with Machine Learning for Predicting Li-ion Battery Remaining Useful LifeJingyuan Xue, Longfei Wei, Dongjing Jiang et al.
Battery degradation significantly impacts the reliability and efficiency of energy storage systems, particularly in electric vehicles and industrial applications. Predicting the remaining useful life (RUL) of lithium-ion batteries is crucial for optimizing maintenance schedules, reducing costs, and improving safety. Traditional RUL prediction methods often struggle with nonlinear degradation patterns and uncertainty quantification. To address these challenges, we propose a hybrid survival analysis framework integrating survival data reconstruction, survival model learning, and survival probability estimation. Our approach transforms battery voltage time series into time-to-failure data using path signatures. The multiple Cox-based survival models and machine-learning-based methods, such as DeepHit and MTLR, are learned to predict battery failure-free probabilities over time. Experiments conducted on the Toyota battery and NASA battery datasets demonstrate the effectiveness of our approach, achieving high time-dependent AUC and concordance index (C-Index) while maintaining a low integrated Brier score. The data and source codes for this work are available to the public at https://github.com/thinkxca/rul.
QMJan 29
ProDCARL: Reinforcement Learning-Aligned Diffusion Models for De Novo Antimicrobial Peptide DesignFang Sheng, Mohammad Noaeen, Zahra Shakeri
Antimicrobial resistance threatens healthcare sustainability and motivates low-cost computational discovery of antimicrobial peptides (AMPs). De novo peptide generation must optimize antimicrobial activity and safety through low predicted toxicity, but likelihood-trained generators do not enforce these goals explicitly. We introduce ProDCARL, a reinforcement-learning alignment framework that couples a diffusion-based protein generator (EvoDiff OA-DM 38M) with sequence property predictors for AMP activity and peptide toxicity. We fine-tune the diffusion prior on AMP sequences to obtain a domain-aware generator. Top-k policy-gradient updates use classifier-derived rewards plus entropy regularization and early stopping to preserve diversity and reduce reward hacking. In silico experiments show ProDCARL increases the mean predicted AMP score from 0.081 after fine-tuning to 0.178. The joint high-quality hit rate reaches 6.3\% with pAMP $>$0.7 and pTox $<$0.3. ProDCARL maintains high diversity, with $1-$mean pairwise identity equal to 0.929. Qualitative analyses with AlphaFold3 and ProtBERT embeddings suggest candidates show plausible AMP-like structural and semantic characteristics. ProDCARL serves as a candidate generator that narrows experimental search space, and experimental validation remains future work.
LGMar 17, 2025
Cohort-attention Evaluation Metric against Tied Data: Studying Performance of Classification Models in Cancer DetectionLongfei Wei, Fang Sheng, Jianfei Zhang
Artificial intelligence (AI) has significantly improved medical screening accuracy, particularly in cancer detection and risk assessment. However, traditional classification metrics often fail to account for imbalanced data, varying performance across cohorts, and patient-level inconsistencies, leading to biased evaluations. We propose the Cohort-Attention Evaluation Metrics (CAT) framework to address these challenges. CAT introduces patient-level assessment, entropy-based distribution weighting, and cohort-weighted sensitivity and specificity. Key metrics like CATSensitivity (CATSen), CATSpecificity (CATSpe), and CATMean ensure balanced and fair evaluation across diverse populations. This approach enhances predictive reliability, fairness, and interpretability, providing a robust evaluation method for AI-driven medical screening models.
RONov 15, 2024
A Self-Supervised Robotic System for Autonomous Contact-Based Spatial Mapping of Semiconductor PropertiesAlexander E. Siemenn, Basita Das, Kangyu Ji et al.
Integrating robotically driven contact-based material characterization techniques into self-driving laboratories can enhance measurement quality, reliability, and throughput. While deep learning models support robust autonomy, current methods lack reliable pixel-precision positioning and require extensive labeled data. To overcome these challenges, we propose an approach for building self-supervised autonomy into contact-based robotic systems that teach the robot to follow domain expert measurement principles at high-throughputs. Firstly, we design a vision-based, self-supervised convolutional neural network (CNN) architecture that uses differentiable image priors to optimize domain-specific objectives, refining the pixel precision of predicted robot contact poses by 20.0% relative to existing approaches. Secondly, we design a reliable graph-based planner for generating distance-minimizing paths to accelerate the robot measurement throughput and decrease planning variance by 6x. We demonstrate the performance of this approach by autonomously driving a 4-degree-of-freedom robotic probe for 24 hours to characterize semiconductor photoconductivity at 3,025 uniquely predicted poses across a gradient of drop-casted perovskite film compositions, achieving throughputs over 125 measurements per hour. Spatially mapping photoconductivity onto each drop-casted film reveals compositional trends and regions of inhomogeneity, valuable for identifying manufacturing process defects. With this self-supervised CNN-driven robotic system, we enable high-precision and reliable automation of contact-based characterization techniques at high throughputs, thereby allowing the measurement of previously inaccessible yet important semiconductor properties for self-driving laboratories.