Somnath Mondal

2papers

2 Papers

LGMar 8
What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover

Ivan Felipe Benavides-Martinez, Justin Guthrie, Jhon Edwin Arias et al.

Geospatial foundation models generate high-dimensional embeddings that achieve strong predictive performance, yet their internal organization remains obscure, limiting their scientific use. Recent interpretability studies relate Google AlphaEarth Foundations (GAEF) embeddings to continuous environmental variables, but it is still unclear whether the embedding space exhibits a functional or hierarchical organization, in which some dimensions act as specialized representations while others encode shared or broader geospatial structure. In this work, we propose a functional interpretability framework that reverse-engineers the role of embedding dimensions by characterizing their contribution to land cover structure from observed classification behavior. The approach combines large-scale experimentation with a structural analysis of embedding-class relationships based on feature importance patterns and progressive ablation. Our results show that embedding dimensions exhibit consistent and non-uniform functional behavior, allowing them to be categorized along a hierarchical functional spectrum: specialist dimensions associated with specific land cover classes, low- and mid-generalist dimensions capturing shared characteristics between classes, and highgeneralist dimensions reflecting broader environmental gradients. Critically, we find that accurate land cover classification (98% of baseline performance) can be achieved using as few as 2 to 12 of the 64 available dimensions, depending on the class. This demonstrates substantial redundancy in the embedding space and offers a pathway toward significant reductions in computational cost. Together, these findings reveal that AlphaEarth embeddings are not only physically informative, but also functionally organized into a hierarchical structure, providing practical guidance for dimension selection in operational classification tasks.

BMNov 27, 2025
DeepPNI: Language- and graph-based model for mutation-driven protein-nucleic acid energetics

Somnath Mondal, Tinkal Mondal, Soumajit Pramanik et al.

The interaction between proteins and nucleic acids is crucial for processes that sustain cellular function, including DNA maintenance and the regulation of gene expression and translation. Amino acid mutations in protein-nucleic acid complexes often lead to vital diseases. Experimental techniques have their own specific limitations in predicting mutational effects in protein-nucleic acid complexes. In this study, we compiled a large dataset of 1951 mutations including both protein-DNA and protein-RNA complexes and integrated structural and sequential features to build a deep learning-based regression model named DeepPNI. This model estimates mutation-induced binding free energy changes in protein-nucleic acid complexes. The structural features are encoded via edge-aware RGCN and the sequential features are extracted using protein language model ESM-2. We have achieved a high average Pearson correlation coefficient (PCC) of 0.76 in the large dataset via five-fold cross-validation. Consistent performance across individual dataset of protein-DNA, protein-RNA complexes, and different experimental temperature split dataset make the model generalizable. Our model showed good performance in complex-based five-fold cross-validation, which proved its robustness. In addition, DeepPNI outperformed in external dataset validation, and comparison with existing tools