Annabelle Yao

2papers

2 Papers

GNDec 31, 2025
Investigation into U.S. Citizen and Non-Citizen Worker Health Insurance and Employment

Annabelle Yao

Socioeconomic integration is a critical dimension of social equity, yet persistent disparities remain in access to health insurance, education, and employment across different demographic groups. While previous studies have examined isolated aspects of inequality, there is limited research that integrates both statistical analysis and advanced machine learning to uncover hidden structures within population data. This study leverages statistical analysis ($χ^2$ test of independence and Two Proportion Z-Test) and machine learning clustering techniques -- K-Modes and K-Prototypes -- along with t-SNE visualization and CatBoost classification to analyze socioeconomic integration and inequality. Using statistical tests, we identified the proportion of the population with healthcare insurance, quality education, and employment. With this data, we concluded that there was an association between employment and citizenship status. Moreover, we were able to determine 5 distinct population groups using Machine Learning classification. The five clusters our analysis identifies reveal that while citizenship status shows no association with workforce participation, significant disparities exist in access to employer-sponsored health insurance. Each cluster represents a distinct demographic of the population, showing that there is a primary split along the lines of educational attainment which separates Clusters 0 and 4 from Clusters 1, 2, and 3. Furthermore, labor force status and nativity serve as secondary differentiators. Non-citizens are also disproportionately concentrated in precarious employment without benefits, highlighting systemic inequalities in healthcare access. By uncovering demographic clusters that face compounded disadvantages, this research contributes to a more nuanced understanding of socioeconomic stratification.

QMDec 31, 2025
Deep Learning Framework for RNA Inverse Folding with Geometric Structure Potentials

Annabelle Yao

RNA's diverse biological functions stem from its structural versatility, yet accurately predicting and designing RNA sequences given a 3D conformation (inverse folding) remains a challenge. Here, I introduce a deep learning framework that integrates Geometric Vector Perceptron (GVP) layers with a Transformer architecture to enable end-to-end RNA design. I construct a dataset consisting of experimentally solved RNA 3D structures, filtered and deduplicated from the BGSU RNA list, and evaluate performance using both sequence recovery rate and TM-score to assess sequence and structural fidelity, respectively. On standard benchmarks and RNA-Puzzles, my model achieves state-of-the-art performance, with recovery and TM-scores of 0.481 and 0.332, surpassing existing methods across diverse RNA families and length scales. Masked family-level validation using Rfam annotations confirms strong generalization beyond seen families. Furthermore, inverse-folded sequences, when refolded using AlphaFold3, closely resemble native structures, highlighting the critical role of geometric features captured by GVP layers in enhancing Transformer-based RNA design.