Muhammad S. Battikh

CVMar 31, 2023

Batch Normalization in Cytometry Data by kNN-Graph Preservation

Muhammad S. Battikh, Artem Lensky

Batch effects in high-dimensional Cytometry by Time-of-Flight (CyTOF) data pose a challenge for comparative analysis across different experimental conditions or time points. Traditional batch normalization methods may fail to preserve the complex topological structures inherent in cellular populations. In this paper, we present a residual neural network-based method for point set registration specifically tailored to address batch normalization in CyTOF data while preserving the topological structure of cellular populations. By viewing the alignment problem as the movement of cells sampled from a target distribution along a regularized displacement vector field, similar to coherent point drift (CPD), our approach introduces a Jacobian-based cost function and geometry-aware statistical distances to ensure local topology preservation. We provide justification for the k-Nearest Neighbour (kNN) graph preservation of the target data when the Jacobian cost is applied, which is crucial for maintaining biological relationships between cells. Furthermore, we introduce a stochastic approximation for high-dimensional registration, making alignment feasible for the high-dimensional space of CyTOF data. Our method is demonstrated on high-dimensional CyTOF dataset, effectively aligning distributions of cells while preserving the kNN-graph structure. This enables accurate batch normalization, facilitating reliable comparative analysis in biomedical research.

LGOct 25, 2021

Latent-Insensitive autoencoders for Anomaly Detection

Muhammad S. Battikh, Artem A. Lenskiy

Reconstruction-based approaches to anomaly detection tend to fall short when applied to complex datasets with target classes that possess high inter-class variance. Similar to the idea of self-taught learning used in transfer learning, many domains are rich with similar unlabelled datasets that could be leveraged as a proxy for out-of-distribution samples. In this paper we introduce Latent-Insensitive autoencoder (LIS-AE) where unlabeled data from a similar domain is utilized as negative examples to shape the latent layer (bottleneck) of a regular autoencoder such that it is only capable of reconstructing one task. We provide theoretical justification for the proposed training process and loss functions along with an extensive ablation study highlighting important aspects of our model. We test our model in multiple anomaly detection settings presenting quantitative and qualitative analysis showcasing the significant performance improvement of our model for anomaly detection tasks.

Muhammad S. Battikh

2 Papers